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Abstract 

Consider a source that produces independent copies of a triplet of jointly distributed 
random variables, {Xi,Yi, Z i }°^ 1 . The process {Xj} is observed at the encoder, and 
is supposed to be reproduced at two decoders, decoder Y and decoder Z , where {1^} 
and {Zi} are observed, respectively, in cither a causal or non-causal manner. The 
communication between the encoder and the decoders is carried in two successive stages. 
In the first stage, the transmission is available to both decoders and they reconstruct 
the source according to the received bit-stream and the individual side information 
({Zi} or {Yi}). In the second stage, additional information is sent to both decoders and 
they refine the reconstructions of the source according to the available side information 
and the transmissions at both stages. It is desired to find the necessary and sufficient 
conditions on the communication rates between the encoder and decoders, so that the 
distortions incurred (at each stage) will not exceed given thresholds. For the case of non- 
degraded causal side information at the decoders, an exact single-letter characterization 
of the achievable region is derived for the case of pure source-coding. Then, for the 
case of communication between the encoder and decoders carried over independent 
memory less discrete channels with random states known causally/non-causally at the 
encoder and with causal side information about the source at the decoders, a single-letter 
characterization of all achievable distortion in both stages is provided and it is shown 
that the separation theorem holds. Finally, for non-causal degraded side information, 
inner and outer bounds to the achievable rate-distortion region are derived. These 
bounds are shown to be tight for certain cases of reconstruction requirements at the 
decoders, thereby shading some light on the problem of successive refinement with non- 
degraded side information at the decoders. 

Index terms - causal/non-causal side information, channel capacity, degraded side- 
information, joint source-channel coding, separation theorem, source coding, successive 
refinement. 



1 Introduction 

We consider an instance of the multiple description problem, which is successive refine- 
ment (SR) of information. The term "successive refinement of information" is applicable 
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to systems where the reconstruction of the source is done in a number of stages. In such 
systems, a source is encoded by a single encoder which communicates with either a single 
decoder or a number of decoders in a successive manner. At each stage, the encoder sends 
some amount of information about the source to the decoder of that stage, which also has 
access to all previous transmissions. The decoder bases its reconstruction on all available 
transmissions, and, possibly, on some additional side information (SI). The quality of recon- 
struction at each stage (at each decoder) is measured with respect to a predefined distortion 
measure. In the case of pure source coding, the information transmitted by the encoder at 
each stage arrives at the decoder noiselessly, while in the case of noisy channels connecting 
the encoder and decoders, the transmission received at the decoder is corrupted and thus, 
joint source-channel coding should be applied. 

A number of works have dealt with the problem of successive refinement PQ-[I], and the 
related problem of hierarchical coding [5]-[7]- In [3], the problem of successive source coding 
was studied for the Wyner-Ziv setting, i.e., when SI is available to each decoder non-causally 
[8]. The encoder transmits a source sequence, X, to two decoders in two successive stages. 
Necessary and sufficient conditions were provided in in terms of single- letter formulas, 
for the achievability of information per-stage rates corresponding to given distortion levels 
of each communication step. For the case of identical SI available at all decoders, the two- 
stage coding scheme was extended to include any finite number of stages. Also, conditions 
for a source to be successively refinable with degraded SI were introduced in [3] for the 
two-stage case. Generally speaking, the notion of degraded SI means that the quality of SI 
available at the decoders of later stages is better than that of earlier stages. 

In [6], the problem of successive refinement with SI available non-causally at each de- 
coder was studied from a different viewpoint. Instead of considering per-stage communica- 
tion rates, the analysis of successive refinement was performed with respect to cumulative 
(sum-) rates achievable at each stage, under per-stage source restoration assumptions. A 
single-letter characterization of the achievable region with successive coding sum-rates and 
distortions was provided for the case of degraded SI at the decoders. It turned out that 
when the rate-sums are analyzed, it is possible to characterize an achievable rate-distortion 
region for any number of stages as long as the SI at the decoders is degraded. 

In [7] , the problem of successive refinement was investigated for the case of SI available 
causally at the decoders. It turned out that, unlike the above described non-causal settings, 
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when SI is available causally, the characterization of the achievable per-stage rate-distortion 
region is possible without constraining SI to be degraded. 

The works reported in the field of successive refinement thus far have considered refine- 
ment of information when the transmission at each stage has been addressed to a single 
decoder. There are, however, many applications where a single encoder conveys informa- 
tion to several decoders in a single transmission. Heegard and Berger [9] and Kaspi [13] 
studied independently the following scenario: a single encoder communicates via a single 
transmission with two decoders one of which accesses the transmission only, while the other 
has a non-causal access to some SI correlated with the source. The source sequence should 
be reconstructed at both decoders with a certain accuracy and, under these distortion con- 
straints, it is desired to reduce the communication rate as much as possible. 

The minimum achievable communication rate, i.e., the rate-distortion function obtained 
for this setup is referred to as the Heegard- Berger rate- distortion function. It was also 
extended in [9] to include a coding theorem for more that two decoders, each having access 
to a different SI with a degraded structure. Now, assume that there is a demand for a better 
reconstruction at either one or both decoders, i.e., the source is required to perform a multi- 
level successive refinement, still communicating with all decoders via a single transmission. 
A question of obvious interest is the following: is it possible to characterize the achievable 
rate-distortion region for this generalized problem of successive refinement? 

In this work, we jointly extend the works of [9], [3J, [6] and [7]. Specifically, we study the 
scenario of two-decoders, two-stage successive refinement of information, with SI available 
at all decoders in either a causaU or non-causal manner. For the causal case, we provide 
a single-letter characterization of the achievable rate-distortion region, which is straightfor- 
wardly extendable to any number of decoders accessible in each stage and any finite number 
of stages. For the case of non-causal SI, we provide inner and outer bounds to the achievable 
rate-distortion region for the case of degraded SI. Note that although the SI is degraded at 
each stage, when both stages are viewed jointly, SI is no longer degraded (same SI is used at 
both stages and thus it is not longer possible to say that at the later stage the SI is of better 



1 There are few reasons for our interest in the scenario of causal SI at the decoders. The first motivation 
is an attempt to include the concept of SR in zero-delay sequential coding systems. Schemes with causal 
SI can be also viewed as denoising systems, where each decoder performs SI sequential filtering with the 
aid of rate-constrained information provided by the encoder. Introducing SR to such systems is of practical 
importance, as it simplifies the decoding process in the sense of performing denoising of the SI symbols 
causally, in a number of steps, rather than using the entire SI sequence. 
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quality), and therefore this setting is of particular interest. When considering the case of 
causal SI, we provide the exact achievable region in terms of the per-stage rates, while for 
the case of non-causal SI, we refer to the sum-rates. The difficulty in characterizing the 
per-stage rates for a general scheme here is similar to that faced in [3J. 

For the case of causal SI we then extend the noise-free setting into a problem of com- 
munication over noisy discrete memoryless channels with random states known causally 
or non-causally at the encoder at all stages of communication. We obtain a single-letter 
characterization of the region of all achievable distortions for both decoders at both stages 
of communication. This characterization reveals that the separation principle is applicable 
for this problem, i.e., it is possible to separately encode the source sequence with a good 
SR source code and then to transmit the obtained bitstreams with a good channel code 
at each stage of communication, without losing asymptotic optimality. This part of the 
paper extends the results of [TO] and [11] to the multi-stage multi-decoder communication. 
Specifically, in [TO] it was shown that the separation principle holds for a single-stage single 
encoder-decoder communication over a simple discrete memoryless channel. This setting 
has been extended in [11] to communication over a channel with random parameters known 
causally or non-causally at the encoder and decoder having non-causal access to the SI 
correlated with the source and there also it was shown that separate source channel coding 
is, in fact, optimal. 

Note that all known closed form (single-letter) results regarding SR (and its variations) 
for decoders having non-causal access to different SI data, such as [9], [3], [4] and [6], treat 
the case of degraded SI at the decoders. Thus, there is a special interest in the following sub- 
case of the problem treated in this paper - SR with non-causal degrades SI at the decoders, 
when decoders are accessed in the reversed order of degradedness of SI. Specifically, for 
the two-stage scheme, assume that in the first stage some information is to be conveyed 
to the decoder that has access to SI of a better quality. Then, at the refinement stage, 
the decoder with less informative SI should reconstruct the source sequence based on the 
transmissions of both stages. This problem has been also addressed in [14] . Specifically, 
in p3], inner and outer bounds on the achievable rates and distortions have been derived 
and it was shown that these bounds coincide when reconstruction at either stage should be 
lossless at the matching decoder. The work presented in |14] has been performed in parallel 
to the researched described in this paper and the inner bounds presented in [TJJ can be 
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easily derived from the results of this paper. The outer bound provided in this paper is 
more precise than that provided in [14] as is discussed in detail in Section HI 

The outline of the paper is as follows: In Section [21 we give notation conventions used 
throughout the paper. A formal definition of the problem is provided in Section [3l In 
Section 01 for the case of causal SI at the decoders, we give the exact characterizations of 
the achievable rate-distortion region and formulate the coding theorems for the successive- 
refinement two-stage source coding and the joint source-channel coding; for the case of 
non-causal SI at the decoders, we provide inner and outer bounds to the rate-distortion 
region and show that in some cases these bounds are tight. The proofs are provided in 
Sections [5] and [6] for the cases of causal and non-causal SI, respectively. 

2 Notation Conventions and Preliminaries 

Throughout the paper, random variables will be denoted by capital letters, specific values 
they may take will be denoted by the corresponding lower case letters, and their alphabets 
will be denoted by calligraphic letters. Similarly, random vectors, their realizations, and 
their alphabets will be denoted, respectively, by boldface capital letters, the corresponding 
boldface lower case letters, and calligraphic letters, superscripted by the dimensions. The 
notations x\ and Xj , where i and j are integers and i < j, will designate segments (rcj, Xj) 
and (Xi, Xj), respectively, where for i = 1, the the subscript will be omitted. For 
example, a random vector X = X^ = (X\, Xn), (iV-positive integer) may take a specific 
vector value x = Xi = (x\, ...,xn) in X , the iVth order Cartesian power of X, which is 
the alphabet of each component of this vector. The cardinality of a finite set A will be 
denoted by \A\. 

Sources and channels will be denoted generically by the letter P, subscripted by the name 
of the random variable and its conditioning, if applicable, e.g., P x {x) is the probability of 
X = x, Py\x(u\ x ) is the conditional probability of Y = y given X = x, and so on. Whenever 
clear from the context, these subscripts will be omitted. The class of all discrete memoryless 
sources (DMSs) with a finite alphabet X will be denoted by V(X), with Px denoting a 
particular DMS in V(X), i.e., V{X) = {P x : Y. x &x P x(x) = 1; Vx G X : P x (x) > 0}. 
For a given positive integer N, the probability of an iV-vector x = (x±, ...,xn) drawn from 
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a DMS Px, is given by 

N 

Pr{X i = x i , i = l,...,N} = Y[P x (xi) = P x (x). (1) 

i=l 

A Markov chain formed by a triplet of random variables (RVs) (X, Y, Z) with a joint 
distribution Pxyz(%, y, z) will be denoted by X -f- 1" -f- Z . 

A distortion measure (or distortion function) is a mapping from the set X x y into the 
set of non-negative reals: d : <Y x y — > 7£ + . The additive distortion d(x,y) between two 
vectors a; G and y G 3^ is given by: d(x, y) = Y^i=x d{xi,yi). 

The information-theoretic quantities, used throughout this paper, are denoted using the 
conventional notations |12j : For a pair of discrete random variables (X, Y) with a joint 
distribution Pxy{x,y) = Px{x)Py\x{v\ x ) > the entropy of X is denoted by H(X), the joint 
entropy - by H(X, Y), the conditional entropy of Y given X - by H(Y\X), and the mutual 
information by I{X; Y), etc., where logarithms are defined to the base 2. 

We next describe the notation related to the method of types, which is used throughout 
this paper in the direct proofs. For a given memoryless source Px and a vector x G X , 
the empirical probability mass function is a vector P x = {P x (a), a G X}, where P x (a) is the 
relative frequency of the letter a G X in the vector x. For a scalar (5 > 0, the set Tp of all 
5- typical sequences is the set of the sequences x G X N such that |-P X («) — Px{o)\ < <5 for 
every a £ X. In this paper, we use some known results from |12| . First, for every x G T Px , 

2 -N[H(X)+e 1 ] < p x ^ x) < 2 -JV[H(X)- ei ] j (2) 

where ei = ei (<5) vanishes as 5 — > and N — > oo. It is also well-known (by the weak law of 
large numbers) that: 

Pr{X£T| x }<e 2 (3) 

where e 2 = 62^), £2 — > as iV — > 00. 

For a given conditional distribution Ty|x and for each a? G Tp , the set Tp of all 
sequences y that are jointly (5- typical with a, is the set of all y such that: 

\P xy (a,b) - P x (a)P Ylx (b\a)\ <5 (4) 

for all a G <Y, 6 G 3", where P xy (a, 6) denotes the fraction of occurrences of the pair (a, 6) in 
(x, y). For any x G Tp and any 8 > 5, 

2 -N[I(X;Y)+e 3 )] < £ p y (y) < 2 -JV[/(X ; y)-e 3 ] ; (5) 

y:(x,y)eT| xy 
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where £3 = €$(5, S) vanishes as 5, 5 — > and N — » 00. These typicality definitions and 
properties, are straightforwardly extendable for jointly typical sequences which come in 
triplets, quadruplets and so on and we use these in the paper. 

3 System Description and Problem Definition 



We refer to the communication system depicted in Figure [TJ Consider a source that produces 
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Figure 1: Two-stage communication scheme. 

independent copies {Xi,Yi, Zi}i>\ of a triplet of RV's, (X,Y,Z), taking values in a finite 
alphabet X x y x Z, and drawn under a joint distribution Pxyz- The process {Xi} is 
observed at the encoder and is supposed to be reproduced at the decoders, where {Yi} 
and {Zi} are observed at decoders Y and Z, respectively. The source is available at the 
encoder non-causally and at the decoders either causally or non-causally, at all stages. At 
the first stage of SR, the reproductions at decoders Y and Z take values in the finite sets, 
X and X, respectively, while at the second stage, the reproduction finite sets are X and X, 
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respectively. 

The coding scheme with causal/non-causal SI at the decoders operates as follows: at the 
first transmission, the encoder sends some amount of information to both decoders over the 
channel. We consider block coding, i.e., an N- vector X (N is a positive integer) is encoded 
at rate R\ into a binary sequence of length M\ , where R\ = log 2 M\ . The binary sequence 
then takes values in {0, 1, 2 NRl — 1}. At the first stage, when non-causal SI is considered, 
decoder Y receives the binary bitstream and reconstructs X = (X\, ...,Xn) G X n , based on 
it and the SI Y, while in the case of causal SI, the reconstruction of the i-th component, Xj, is 
based on the encoder transmission and only % first symbols of the SI, i.e., Y[. Similarly, with 
non-causal SI, decoder Z uses the encoder transmission and Z in its entirety and reproduces 
X= (Xi,...,X N ) G X N , while in the case of causal SI, only the bitstream and Z{ are used 
for reproduction of Xj. The quality of reconstruction at each of the decoders is judged in 
terms of the expectations of additive distortion measures d V: i(X, X) = dy,i{Xi,Xi) 
and d z ,i{X,X) = jjE^idzAXi,^), where d y ,i(X,X) and d zA (X,X), X G X, X G X, 
X e X, are non-negative, bounded distortion measures. At the second stage, the encoder 
sends, at rate R2—R1, an additional information about the source sequence to both decoders, 
also in the form of a binary bitstream, this time of length Mi = 2 N ( R2 ~ Rl \ taking values in 
{0, 1, 2 Ar ( i?2 ~ i?1 ) — 1} . The decoders reconstruct the source sequence with better accuracy 
(in terms of the distortion measures) according to both transmissions of the encoder and 
the individual Si's. The distortions measures used at the decoders Y and Z at this stage 
are also additive, dy j2 (X, X) = jfYH=idy,i{ x i^ x i) and d z>2 (X, X) = ^ Yh=i d z ,2{Xi, Xj), 
where d y ^(X,X) and d z ^(X,X), X G X, X G X, X G X, are non-negative, bounded 
distortion measures. This setting can be straightforwardly extended to any number of 
refinement stages as well as any number of decoders at each stage. We confine ourselves to 
the case of two decoders and two stages. 

We begin with the case of non-causal SI. 

Definition 1. An (N, Mi, M 2 , {A 2/) j. ; A Z) jfc}| =1 ) source code for a single encoder, two de- 
coders and two-stage successive refinement with non-causal SI at the decoders, for the source 

Pxyz, consists of a first- stage encoder- decoder triplet (/1, g y ,i,g z ,i): 

h:X N ^{1,2,...,M 1 }, (6) 

g Vtl :y N x{l,2,...,M 1 }^X N , (7) 

g zA :Z N x{l,2,...,Mi}^X N , (8) 
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and a second-stage encoder- decoder triplet (f2, 9 y ,2, 9z,2) : 

h :X N ^{1,2,...,M 2 }, (9) 
9y,2 : y N x {1,2,..., Mi} x {1,2,..., M 2 ] - X N , (10) 
g 2i2 : Z N x {l,2,...,Mi} x {1,2,..., M 2 } -> (11) 

£dj,,i(X,X) < iVA y ,i S4,i(X,X) < iVA Zj i 

and 

^ )2 (X, X) < N A y , 2 Ed z>2 (X, X) < iVA 2 , 2 . 

When SI is available to the decoders causally, in analogy to Definition it is possible 
to define an (N, Mi, M 2 , {A y j c ,A z ) ( .}f, =1 ), source code for coding with causal SI, where the 
first-stage decoder pair (g yi i,g z ,i) is now presented via {g y ,i,i}^Li and {g z ,i,i\i=ii where 
g y ,i,i and g z ,x,i denote the reconstruction functions for the i — th symbol of X N and X , 
respectively: 

g yXi :y{x {1,2,... , Mi} ^X, (12) 
g Zt i >i :Z{x{l,2,...,M 1 }^X. (13) 

Similar adjustments of definitions should be applied to the second stage, considering now 
(g y ,2,g z ,2) presented in terms of {g y ,2,i}f =1 and {g x ,2,i}f=x- 

g y ,2,i-.yi x {1,2,..., Mi} x {1,2,...,M 2 }^X, (14) 
g zXi : Z{ x {1, 2, .., Mi} x {1, 2, M 2 } -> X. (15) 

The sum-rate pair (Ri,R 2 ) of the (N, Mi, M 2 , {A y ^,A z ,k}\ = i) code for two stage successive 
refinement for two decoders is given by Ri = -k log 2 (Mi) and R 2 = log 2 (Mi ■ M 2 ). 

Definition 2. Given a distortion quadruplet D = {A^ A z k}k=l > a ra ^ e P a ^ r (R11R2) 
is said to be achievable with SI (Y, Z) if for every e > ; there exists a sufficiently large 
block length N, for which there is an (N,2 N( - Rl+e \ 2 N{R2+t \ A y ,i + e, A z>1 + e ,Ay,2 + e, 
&-z,2 + e); source code for successive refinement with non-causal SI at the decoders for the 
source Pxyz- 
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The definition of the notion of an achievable region with causal SI per-stage rates can 
be straightforwardly modified in parallel to Definition [2, referring to the first stage rate 
R\ and the second-stage rate AR = R2 — R\ = log 2 (M2). The collection of all D- 
achievable rate pairs is the achievable rate-region for successive-refinement coding with 
non-causal (respectively, causal) SI and is denoted by 1Z{D) nc (respectively, 1Z(D) C ). The 
collection of all i?2, {Ai,fcj A2,fc}| =1 )-achievable rate-distortion tuples is the achievable 
rate-distortion region, and is denoted by 1ZV nc and 1ZT> C , referring to non-causal and causal 
settings, respectively. In this work, we propose strategies for (asymptotically) achieving any 
given point in 1ZT> C and certain points in TZT> nc . 

It is also interesting to investigate the scenario where communication between the en- 
coder and the decoders is carried over a noisy media. In this case, the source block X is fed 
into a joint source- channel encoder, whereas the corresponding blocks of yand Z are fed as 
side information in either a causal or non-causal manner into the Y and Z decoders, respec- 
tively. In the sequel, we confine ourself to the case of causal source SI at both decoders^ 
In this paper, at each stage of communication, the noisy media is modeled by a discrete 
memoryless channel whose output is governed by its input and a random parameter which 
is known at the encoder either causally or non-causally. 

Consider the communication scheme depicted in Figure [2j The channel used at the first 

stage is channel 1, Pb\a,Si an d at the second stage is used channel 2, P§\A,s- The channels 

are independent and we denote their capacities by C\ and C2, respectively. The channels 

work as follows: The input of Channel 1 is a vector pair (A™,^), where n is a positive 

integer and where A and S take values in the finite sets, A and 5, respectively. Channel 

1 produces a vector output B n , whose components take values in the finite set B. The 

conditional probability of (B n ) given (A n ,S n ) is characterized by PBn^n S n{b n \a n , s n ) = 

\Xi=\ PB\A,s(°i\ a ii s i)- The vector A n is referred to as the channel input and S n is referred to 

as the channel state sequence, governed by another discrete memoryless process Pg™(s n ) = 

niLi Ps(si), independently of (X N , Y N , Z N ). The operation of Channel 2 is described in 

a similar fashion by the triplet (A m ,B m ,S m ,) instead of (A n ,B n ,S n ) and corresponding 

marginal and conditional probabilities. Note that in the context of Channel 2, all blocks 

are of length m, where mis a positive integer. We denote the source-channel rate ratios by 

2 Since the complete characterization of lZT> nc is still open, there is no point in analyzing the scenario of 
communication over noisy channels for the case of non-causal source SI at the decoders. 
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Figure 2: Communication over noisy channels with causal SI. 

Now, instead of the binary bitstream generated in the noise-free case, the first-stage 
joint source-channel encoder implements a deterministic function a n = fi(x N ,s n ) and the 
second-stage joint source-channel encoder implements another deterministic function a m = 
fi(x , s™). If the channel states are available at the encoder causally, each channel symbol 



a,i depends only on x N , a 1 1 and s l , and each cij depends only on x 



j-i 



and s 1 . In the 



non-causal case, each channel symbol ai depends on x 



N „»-l 



and s n , and each depends 



x N , a' -1 and s™. The first-stage decoders Y and Z are defined now by deterministic 
functions g Vt i(y N ,a n ) and g Zt x(z N , a n ) , respectively, and the second stage decoders Y and 
Z are defined by deterministic functions g y ^{y N b n , b m ) and g z ^(z N , b n , b m ), respectively. 
The channel states S and S are independent and we interpret the independence of the 
channels via the Markov relation (S,B) -j- X-i- (S, B). 

In parallel to Definitions [T] and [21 we define the following: 
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Definition 3. For a given memoryless source Pxyz and two memoryless channels with 
random states Pb\a,s an d Pb\a,s an «, ™, A^i, A 2i i, A Si 2, A 2> 2) joint source- channel 
code for successive refinement with causal state information at the encoder and causal side 
information at the decoders consists of a sequence of n first-stage encoding functions: 

/ M : X N x S l - A, i = l,...,n, (16) 

a sequence of N first-stage decoding functions 

g yA , t :yxB n ^X, i = l,...,N, (17) 

and 

g z ^ t :Z % xB n ^ X, i = l,...,N, (18) 
a sequence of m second-stage encoder functions 

f 2>i : X N x S l -> A, i = l,...,m, (19) 
and a sequence of N second-stage decoding functions: 

9y,2,i :yxB n xB m ^X, i = 1, .., N, (20) 



and 



such that 



g Z:2>l :Z i xB n xB m ^X, i = l,...,N, (21) 



Edy^X, X) < JVA„,i Ed Ztl (X, X) < NA zA 



and 



Ed yt2 (X, X) < E4, 2 (X, X) < iVA z , 2 , 

where the expectations are w.r.t. the source and the channels. 

Definition 4. Given the source- channel rate ratios pi and p 2 , a distortion quadruplet D = 
{A y> k, A Z: k}1 = i is said to be achievable if for every e > 0, there exist sufficiently large N, n 
and m, with p\ = n/N and p 2 = m/N, and there exists an (N, n, m, A y> i + e, A Zj i + e, A J/j 2 + 
e, A 2i 2 + e) joint source- channel code for successive refinement with causal/non-causal state 
information at the encoder and causal side information at the decoders for the source Pxyz 
and the channels Pb\a,s> Pb\a,s- The distortion region, denoted V, is the closure of the set 
of all achievable quadruplets D. 
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We provide a single-letter characterization of V for the cases of causal/non-causal chan- 
nel state information availability at the encoder. In particular, we show that any given 
point in V can be achieved by separate source coding for the source Pxyz (achieving 1ZV C ) 
and capacity-achieving channel coding (independently of the source). 

4 Main Result 

4.1 Causal Side Information 
4.1.1 Pure Source Coding 

We begin with the case where availability of SI at the decoders is restricted to be causal. 
Let a distortion quadruplet D = ({A^fc, A z fc}? =1 ) be given. Define 1Z*(D) C to be the set of 
all rate pairs (R\, R 2 ) for which there exist RVs (W±, W 2 ), taking values in finite alphabets, 
Wi,W2, respectively, s.t the following holds simultaneously: 
1. The following Markov chain holds: 



(W U W 2 ) + X + {Y,Z). 



(22) 



2. There exist deterministic decoding functions G Vt i : y x Wi — > X, G Zi i : Z x Wi — > X, 
and Gy :2 : y x Wi x W 2 -» «V , G 2i2 : 2xWixW 2 ^^, such that 



^, 1 (X,G,, 1 (y,W 1 ))<A ?/>1 



(23) 



Ed Zj i(X,G Zj i(Z,Wi)) < A Zjl 



(24) 



Ed y , 2 {X,G y , 2 {Y,W u W 2 )) < A y , 2 



(25) 



Ed Z:2 (X,G Zj2 (Z,Wi,W 2 )) < A Zj2 



(26) 



3. The alphabets W\ and W2 satisfy: 



|Wi|<|Af|+5, |W 2 | < \X\ ■ + 2 



(27) 



4. The rates i?i and i? 2 satisfy 



R 1 >I(X;W 1 ) R 2 - R 1 >I{X;W 2 \W 1 ). 



(28) 



The main result of this subsection is the following: 
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Theorem 1. For any DMS Pxyz, 

n(B) c = n*(B) c . (29) 

The proof of Theorem [T] appears in Section Note that when SI is available at the 
decoders causally, there is no degradedness assumption on SI, which is very different from 
the case of SR with non-causal SI even when a single decoder is considered at each stage 
[4J-[6j, as well as for the multi-group SR discussed in the next section. 

The relative simplicity of characterization of 1Z(D) C is better understood when studying 
the achievability scheme: The direct part is based on the fact that the encoder transmits a 
concatenation of indexes of the auxiliary codewords instead of bin numbers transmitted 
in the non-causal setting [l]-[6]. Hence, each decoder can access all the auxiliary codewords 
directly and, unlike in the non-causal setting, it does not use its SI to retrieve codewords, but 
only for reconstruction. Unlike in the case of coding with non-causal SI at the decoders, the 
results obtained for the two-decoder two-stage coding with causal SI are straightforwardly 
extendable to any number of decoders and refinement stages and the number of auxiliary 
RVs is determined solely by the number of communication stages^. 

4.1.2 Joint Source-Channel Coding 

We next address the problem of joint source channel coding, where at each communication 
stage the encoder conveys its information to two decoders over a noisy stationary memoryless 
channel governed by a random state, which is known causally or non-causally to the encoder. 
The general scheme is described in Fig. [2j The necessary and sufficient conditions for 
(Ai, A2) to be the achievable distortion levels are summarized in the following Theorem: 

Theorem 2. Given a DMS Pxyz, the distortion levels ({A^., A Zi fc}| =1 ) are achievable for 
successively refinable communication with causal SI at the decoders over noisy stationary 
memoryless channels Pb,o\a,s an d P& o\A s w ^th channel states known at the encoder either 
causally or non-causally if and only if there exist auxiliary RVs W\ and W2, taking values 
in finite alphabets Wi and W2, of cardinalities given by and satisfying \22\) , and de- 
terministic decoding functions G y< i, G Zj \, G y ^ andG z ^, satisfying [23\) - {26\). respectively, 

3 The direct use of indexes of the auxiliary codewords, similarly as is done for coding without SI at the 
decoder, was first introduced in |15| . in the achievability proof of the characterization of the rate-distortion 
function with causal SI at the decoder. 

4 While, as we show in the next section, in the non-causal setting, at each stage, for each decoder, at least 
one auxiliary codeword is added to the direct scheme. 
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such that 



I{X;Wx) < piCx 



(30) 



I{X-W 2 \Wi) < P2 c 2 . 



(31) 



There is an obvious similarity between the characterization of T> c and the character- 
ization of the region of all achievable distortion levels described in Theorem [2J both for 
the cases of causal and non-causal state information at the encoder. The only difference 
in characterizations is the following: in the case of communication over noisy channel the 
upper-bounds in ([30j) and (f3Tj) are p\C\ and P2C2, while in the noise- free case, these bounds 
are substituted by R\ and R2 — Ri, respectively. Therefore, a possible achievability scheme 
is the one based on separate source and channel coding. 

The direct proof of Theorem [2] comes from a concatenation of the asymptotically opti- 
mal source code designed for multi- group successive refinement, which is independent of the 
channels, and a reliable channel codes, independent of the source, designed for each of the 
channels (with channel state informations available to the encoder either causally or non- 
causally). The channel codes should achieve (at least asymptotically) the capacity of the 
relevant channels. Now, if such source and channel codes are used and the distortion con- 
straints are maintained by the source code, as soon as I(X; W%) < p\C\ and I(X; \V2\W1) < 
P2C2, it is always possible to select source and channel rates R s ± and R c \ for the first stage 
and R S 2 — R s i and R C 2 for the second stage such that NI(X; W%) < NR S \ = nR c \ < nC\ 
and NI{X\W2\W\) < N[R S 2 — R s i] = mR c2 < mC2- Now, it is possible to compress the 
source sequence into R s i bits per symbol for the first stage and into R S 2 — Rsi bits per 
symbol for the refinement stage, such that the distortions {(A y j, A z j)}j =1 are satisfied 
and then map the obtained bitstreams of length NR s i and N[R S 2 — R s i] into channel code- 
words of length nR c i and mR C 2, respectively. Since R c \ < C\ and R C 2 < C2, from the 
standard coding theorem ([18] or [H]), there exist channel codes that cause asymptotically 
negligible distortions. Also, by the source coding theorem (Theorem [1]) all the distortions 
for which NI(X;Wx) < NR sl and NI(X;W 2 \W 1 ) < N[R s2 - R sl ] are achievable. Thus, 
the distortions {(A yJ , A zJ )} 2 j=1 such that NI{X;Wi) < nC x and NI(X;W 2 \W 1 ) < mC 2 
are achievable. The details of the converse proof are provided in Section [SI and, similarly 
as in the noise-free case, the proof is easily extendable to more than two communication 
stages and more than two decoders at each stage. 
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4.2 Non-Causal Degraded Side Information 

Unlike in the case of causal SI, in the noncausal case, a closed-form characterization of the 
achievable rate-distortion region with non-causal SI at the decoders is yet to be derived. 
In this subsection, we provide outer and inner bounds to the achievable region, discuss the 
differences between the bounds and show that in certain cases, the bounds coincide, i.e., 
the rate-distortion region is fully characterized for these special cases. We begin with the 
outer bound. 

4.2.1 Outer Bound 

Define TZ**(D) nc to be the set of all rate pairs (R±, R 2 ) for which there exist RVs {Wj}| =1 
and V, taking values in finite alphabets, {Wi}j =1 and V, respectively, such that (s.t.) the 
following conditions are satisfied: 
1. 

(W 1 ,W 2 ,W 3 ,W 4 ,V)+X + Z + Y (32) 

is a Markov chain. 

2. There exist deterministic decoding functions G V: i : y x Wi — > X, G Z: \ : Z x W\ x W 2 x 
V -»• X, G y , 2 : y x Wi x W 3 x V -► X and G 2)2 : Z x Wi x W2 x W3 x W4 xV->I, 
such that 

i^i(X,G y) i(y,Wi))<A tf)1 (33) 
^,i(X,G z> i(Z,Wi,W 2 ,F)) < A 2il (34) 
^, 2 (X, G 2/ , 2 (y, W U W 3 ,V))< A y , 2 (35) 
^ >2 (X, G 2 , 2 (Z, W 1 ,W 2 , W 3 , W 4 , V)) < A 2 , 2 (36) 

3. The alphabets {Wfc}| =1 and V satisfy: 

|Wi|< |*|+5, (37) 



|V| < |*| • |Wi| +4, 



\m\ < \x\ ■ |Wii • |v| + 3, 



\m\ < 1*1 • |Wi| • \m\ ■ |v| + 2, 



(38) 
(39) 
(40) 
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|W 4 | < \X\ ■ \Wi\ • \W 2 \ • \W 3 \ • \V\ + 1. (41) 
4. The rates R\ and R 2 satisfy 

Rl >/(X;W r 1 jy)+/(X;Ty 2 ,y|^ 1 ,Z) (42) 

R2 > /(X;Wi,W'3 ) ^|y) + J(X;W'2,W4|Wi ) W , 3 } V;Z) (43) 

The outer bound to the rate-distortion region is summarized in the following Theorem: 

Theorem 3. For any DMS Pxyz s.t. X 4 Z 4 Y , and a quadruplet of distortions D = 
{A^, A 2)fc }2 =i? 7e(D) nc C H**(D) nc . 

The proof of this result follows the lines of the converse proof of Theorem 1 in [1] and 
it is provided in Section [6J Consider now to the case where the distortion requirements are 
Ay ; i = 00 and A 2j2 = A 2j i, i.e., the case where at the first stage only Z-decoder is required 
to reconstruct the source and at the second stage the Y-decoder is required to reconstruct 
the source while Z-decoder is not required to improve its source reconstruction any further. 
Define the degraded region 1Z{D) nc of all rates and distortions matching A y 1 = 00 and 
A 2j2 = A 2 i by 1Z(A y2 , A z 1). This special instance of our problem has been studied in |14j . 
The outer bound obtained in p3] is the following: Define the region 7£ ouf (A 1 ,A 2 ) to be 
the set of all rate pairs {R\,R 2 ) for which there exist random variables (W^W^) in finite 
alphabets Wi, W 2 s.t. the following conditions are satisfied: 

1) (IYi,W 2 )4X4Z4Y. 

2) There exist deterministic maps G\ : Z x VVi — » X and G 2 : y x W 2 — ► X s.t. 
Ed Ztl {X,h(Z,W x )) < A! and Ed y , 2 {X, / 2 (Y, W 2 )) < A 2 . 

3) |Wi| < + 3) + 2, |W 2 | < \X\ + 3. 

4) The non-negative rate vectors satisfy: 

i?i > 7(X;Wi|Z), Rx + R 2 > I(X;W 2 \Y) + I(X;W X \Z,W 2 ). 

Theorem 4. For any discrete memoryless stochastic source with Sis under the Markov 
condition X 4 Z 4 Y , 7£(Ai, A 2 ) C 7£ ou t. 

Note that this outer bound is straightforwardly obtainable from the outer bound of this pa- 
per by taking W\ = const., V = const., W4 = const, and renaming the pair (W 2 , W3) to be 
(W\, W 2 ) as well as setting (A^i, A 2) i, A y>2 , A Z)2 ) to be equal (00, Ai, A 2 , Ai), respectively, 
and also disregarding (G yj i,G Z;2 ) while renaming (G Z) i,G y;2 ) to be (G±,G 2 ). 
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4.2.2 Inner Bound 

Let a distortion quadruplet D = {A y> k, A 2i fc}| =1 be given. Define 1Z*(D) nc to be the set 
of all rate pairs (Ri,R 2 ) for which there exist RVs {Wj}| =1 and V, taking values in finite 
alphabets, {Wi}f =1 and V, respectively, s.t. the following conditions are satisfied: 

1. The following Markov conditions hold: 

(W 1 ,W 2 ,W 3 ,W 4 ,V)+X + Z + Y (44) 
W 2 + (X,W 1 ,V) + W 3 (45) 

2. There exist deterministic decoding functions G y> i : yxW\ — > A", G^i : ZxW\xW 2 xV — > 
G y ,2 : y x Wi x W 3 x V ->■ A?, G Zj2 : Z x Wi x W 2 x W 3 x W 4 x V ->■ # such that 

^ )1 (X,G J/ , 1 (y,W 1 ))<A 2/;1 (46) 
^,i(X,G M (Z,Wi,W 2 ,K)) < A,,i (47) 
Edy, 2 (X, G y , 2 (Y, W 1 , W 3 , V) ) < A y , 2 (48) 

Ed z , 2 {X,G z , 2 {Z,W x ,W 2 ,W 3 ,W±,V)) < A z , 2 (49) 

3. The alphabets {Wk}j. = i and V satisfy: 

|Wi|< 1*1 + 6, (50) 

|V|<|Af|-|>Vi| + 5, (51) 
\m\ < \X\ • |Wi| • |V| + 4, (52) 
|W 3 |<|^|-|Wi|-|W 2 |-|V| + 3, (53) 

|W 4 | < |Af| • |Wi| • \W 2 \ ■ \W 3 \ ■ |V| + 2. (54) 

4. The rates i?i and R 2 satisfy 

R 1 > I(X;W!\Y) + I(X;W 2 ,V\W 1 ,Z) (55) 

#2 > /(X;Wi,V,W r 3|y)+/(X;W 2 |Wi,V;Z) 

+ /(X;W 4 |Wi,W2,W3,V;Z) (56) 

The inner bound to the rate-distortion region is summarized in the following Theorem: 
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Theorem 5. For any DMS Pxyz s.t. X -j- Z -j- Y , and a quadruplet of distortions D = 
{A Vik , A z>k }l =v U*{T)) nc C ft(D) nc . 

The inner bound provided in this section demonstrates tradeoffs between various schemes 
which are based on the notion of (strong or weak) typicality. Recall that in the achievabil- 
ity schemes of successive refinement treated in [1] and [6] the generation of the auxiliary 
codebooks is sequential: First, the codebook used at the first stage is generated; then, for 
each codeword of that codebook another codebook conditional on the codeword is gener- 
ated, and so on. Every generation of a codebook is conditioned on codewords of previously 
generated codebooks. The encoder chooses the auxiliary codewords in a sequential manner, 
first finding a good codeword in the first codebook; then in the second codebook (which was 
generated conditioned on that good codeword), it finds another good codeword, and so on. 
The encoder proceeds until it has found all codewords needed to describe the source at the 
desired accuracy at all stages of successive refinement. The decoding process at each stage 
is also performed in a sequential manner, i.e., first, the codeword in the first codebook is 
found. Then, in a second codebook (matching that codeword), a second codeword is found 
and so on. 

When multi-group successive refinement is considered, it is unclear if the auxiliary code- 
books achieving rate-distortion bounds should be generated "sequentially" (in the sense 
described above) or "in parallel", with two or more codebooks generated unconditioned on 
one another. The achievability scheme of this paper demonstrates a semi-parallel approach 
where some of the codebooks are generated sequentially and some in parallel. We proceed 
with discussing the meaning of degraded SI at the decoders and then we briefly describe 
the idea standing behind the achievability scheme. 

When referring to degraded SI, the term usually used is that the stronger Z-decoder 
(that has access to SI of higher quality) can do whatever the weaker Y-decoder can do 
@]-[7], i-e., the Z-decoder can find all the codewords that were addressed to Y-decoder. 
To understand this property, consider the following scenario: Assume that one performs 
Wyner-Ziv (W-Z) coding [8] for a pair (X, Y) where X is known at the encoder and Y is 
known at the Y-decoder. Now, assume that the source generating the (X, Y) pair is, in 
fact, a ternary source, generating a triplet (X, Y, Z), X 4- Z ^rY and that Z is known at Z- 
decoder. Finally, assume that the W-Z coding for Y-decoder is performed with a codebook 
of auxiliary codewords generated independently of each other and each symbol of which is 
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generated according to P\Vi, s.t. W\ 4 X 4- Z 4 Y. Obviously, the long Markov chain also 
satisfies the shorter Markov chain Wi 4 X 4 Y required by W-Z scheme for coding for the 
Y-decoder. But, due to the Markov chain W x 4 X 4 Z 4 Y, /(Z; Wi) > J(Y; Wi), and thus, 
Z-decoder is able to find the correct codeword in the bin of size 

2 Ni(Y-,w 1 ) generated for the 

Y-decoder. The question is the following - given that Z-decoder can always find codewords 
addressed to the Y-decoder, how we can exploit this property rigorously? 

We interpret the degradedness of SI as follows: bins associated with a code designed 
for the Z-decoder are divided into bins associated with a code designed for the Y-decoder. 
Specifically, a codebook of about 2 NI( - x '> Wl ^ codewords is partitioned twice - first into "large" 
bins of about 2 NI ( Z ' Wl ^ codewords matching W-Z code for the Z-decoder, and each of these 
bins is further partitioned into smalle bins of about 2 NI ( Y ' Wl ^ codewords each. 

In W-Z coding designed for communication with Y-decoder only, the indexes of the 
smaller bins are directly transmitted to the Y-decoder. Note that alternatively, one can 
first send to Y-decoder an index of the larger bin and then "refine" it with the "internal" 
index of the matching small bin. This observation immediately leads to the following con- 
clusion: if a single codeword is simultaneously good for communication with both decoders 
(in the sense of satisfying the reconstruction requirements), the encoder can communicate 
with both decoders in a two-stage successive manner, by first transmitting the index of 
a large bin (that contains a good codeword) to both decoders (the index is fully usable 
only by the Z-decoder), and then, in a separate additional transition, sending the match- 
ing "internal" index which is crucial for communication with the Y-decoder (and does not 
provide new information to Z-decoder). The obvious question that arises is what hap- 
pens when a single codeword is not sufficient for communication with two decoders and 
more codebooks must be created. Firstly, under certain Markov conditions, the principle of 
such an hierarchical (or nested) binning can be applied as well to conditional W-Z codes. 
Specifically, when the Markov condition (Wi, W2—, Wi) — X — Z — Y holds, we obtain that 
I(Z; Wi\W\, Wi-i) > I(Y; Wi\Wi, Wj_i). Secondly, the real problem arises when not 
all codewords sent to the Z-decoder must be revealed to the Y-decoder in the next step, and 
in this case, sequential/hierarchical codesbooks generation is no longer obviously optimal. 

The coding scheme is based on the following concept: At the first stage, three codebooks 
are generated, essentially, according to the hierarchical Wyner-Ziv coding scheme. First a 
codebook C Wl of ~ 2 NI( - X ' Wl ^ codewords is generated according to P$ , and is partitioned 
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into bins of size of ~ 2 NI( ^ Y ' Wl \ Thus, there are ~ 2 N ^^ X ' W ^-^ Y ' Wl ^ such bins. Due to the 
Markov chain Wx 4 X 4- Y, I{X; Wi) - I{Y; Wi) = I(X; Wi\Y). Next, for each Wi € C m , 
a codebook C v (w\) of ~ 2 NI ( x '< v \ Wl ^ codewords is generated according to Pyi Wl and is 
partitioned into bins of size of ~ 2 NI ( Z ' V \ Wl \ and each of these bins is partitioned into 
smaller bins of size ~ 2 NI ^ Z ' V \ W ^ each. Thus, there are ~ 2 N ^^ X ' V \ W ^- I( - Z ' V \ W ^ large bins 
and ~ 2 N [ I ( Z ' V \ WlS )~ I ( ¥ ' V \ w ~t)] small bins within each large bin. Due to the Markov chain 
(1331) . the number of bins: ~ 2 NI( ^ x ' y \ w ^ large bins and ~ 2 NI( - z ' y ^ Wl ^ small bins within 
each large one. Finally, a codebook C W2 (w\,v) of ~ 2 I{X;Wi\Wx,V) codewords is generated 
for each toi E C m and v G C v (w\) according to P^ 2 \wi V anc ^ * s Petitioned into bins of 
size ~ 2 NI ^ W ^' V \ so by fl33J), there are ~ 2 NI ( X ' W ^ V ^ such bins. 

At the second stage, another two codebooks are generated - for each w±, v(w±) and 
102(101, v), in codebook C W3 {v}\, v), the codewords are generated according to Pyy 3 \Wi V anc ^ 
in codebook C W4 (tt>i, u, 102, 103), the codewords W4 are generated according to P^AWi vw 2 W- 
These codebooks are also partitioned into bins, specifically, C W4 (-) is partitioned into 
_ 2 ni(X;W 3 \W 1 ,V,y) bins of size ~ 2 NI( - Y ^ W ^ V ^ each. Similarly, C 5 is partitioned into 
„ 2 NI(X;W 4 \W U V,W 2 ,W 3 ,Z) bing; each of them of gize „ 2 ^(^l Wx.V.Wi.Ws) . The key feature 

of this scheme is in fact that the C W3 (-) does not take into consideration statistics of W 2 . 
Since its codewords must yet be used by the Z-decoder, rising typicality considerations dur- 
ing the encoding/decoding process, the additional Markov condition ([35]) is imposed on the 
achievability scheme. 

If the encoder succeeds to find good codewords in all five codebooks (the details appear 
in the formal proof of Theorem [5j), the rate of the first transmission, Ri, is composed of 
three indexes of the bins that contain good codewords in the codebooks C Wl , C v {-) and 
C W2 (-), where for C v {-) only the index of the large bin is used. In this manner, similarly 
to the classical W-Z coding, only the codeword of C m serves the Y-decoder, while all 
three codewords are decoded by Z-decoder. Hence, Rl ~ I{X;W\\Y) + I(X;V\W%, Z) + 
I(X; W2IW1, V, Z) as is given by eq. (f55l) . At the second transmission, the encoder first 
refines the description of the bin of codebook C v (-), and then transmits the indexes of the 
chosen bins in codebooks C Wa {-) and C W4 (-). Thus, the codewords in codes C Wl , C v (-) 
and C W3 (-) serve the reconstruction in the Y-decoder and all five codewords are retrieved 
correctly by Z-decoder and are used for reconstruction of the source. The incremental rate 
at the second stage is I(Z;V\W 1 ,Y) + I(X; W 3 \W X , V, Y) + I(X; \V4\W1, V, W 2 , W 3 ) and 
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therefore, the cumulative rate at the second stage is as given by eq. (|56j) . 

The scheme that leads to the inner bound is interesting due to the following: The code- 
book generation is not fully sequential, but some of the codebooks are generated in parallel 
and are independent (unconditioned) of each other. Unfortunately, with this approach the 
rate expressions of the inner and outer bounds obtained at the second stage are not identical 
and the bounds differ in additional Markov conditions imposed on the auxiliary RVs of the 
direct scheme. Yet, for the case of lossless reconstruction at either the first or the second 
stage, i.e., A z \ = or A y ^ = 0, respectively, the achievability scheme achieves communi- 
cation rates suggested by the outer bound and thus closes the gap between the inner and 
the outer bounds. 

4.2.3 Special Cases 

We now confine our attention to a number of special cases in which the gap between the outer 
bound and the inner bound vanishes. First, we consider the case of distortion requirements 
A Zj i > A^i or A y 2 = A^i, that is, SR with respect to only one of the decoder at either the 
first or the second stages, respectively. We then consider the case of distortion requirements 
A Zj i = or A Vj 2 = 0, that is, lossless reconstruction at Z-decoder at the first stage, or at 
the Y-decoder at the second stage, respectively. For these cases, the achievability scheme 
achieves the boundary curve of the outer bound. 

Successive Refinement 

When A 2j i > A^i or A Vi 2 = A y i , the multi-decoder SR problem degenerates to the 
problem of refinement of information with respect to only one decoder at either the first 
or the second stage, respectively. The requirement A Zj \ > A^i fits the scenario where the 
Z-decoder performs reconstruction of the source on the basis of the same transmission that 
served the Y-decoder, so the average distortion it achieves is at least as small as that of 
the Y-decoder. The requirement = A ?/i i j fits the scenario where the Y-decoder is not 
required to refine its reconstruction at the second stage. For these cases, the inner and the 
outer bounds coincide, as is summarized in the following theorem: 

Theorem 6. // A z>1 > A yA or A y>2 = A y>li , then K(V) nc = ft**(D) nc = K*(D) nc . 
Specifically, when A 2j i > A V) \, TZ(D) nc is given as in the Subsection \4-2.1 with the rate 
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inequalities replaced by 



R 1 >I(X;W 1 \Y) and R 2 > I(X; Wx, W 3 \Y) + I(X; W 4 \Wx, W 3 , Z), (57) 
for the auxiliary RV's satisfying (Wx, W 3 , W4) 4 X 4 Z 4 Y . 

When A yj 2 = A^i, 7£(D) nc is given as in the Subsection \4-2. 1 with the rate inequalities 
replaced by 

Rx > I(X;W 1 \Y)+I{X;W 2 \W 1 ,Z) and R 2 > I(X; W X \Y) + I(X; W 2 , W 4 \Wx, Z). (58) 
/or the auxiliary RV's satisfying (Wx, W 2 , W4) 4 X 4 Z 4 F. 

The proof of the achievability part of Theorem [6] can easily be done by setting W 2 = 
V = const, for A 2; i > A^i and setting W3 = V = const, for the requirement A y<2 = A^i in 
lZ*{D) nc . The converse proof follows by considering a three-stage communication scheme 
in the converse proof of [B] and combining two of its stages into a single stage for each of 
the above cases. For the case A 2i i > A yj i, the first stage of Theorem [6] is essentially the 
first stage of 0, with the transmission addressed to the Y-decoder. The second stage of 
Theorem [6] is a combination of the second and the third stages in [6] , where at the second 
stage of [6j, SR is performed with respect to the Y-decoder, and at the third stage of [6], 
SR is performed with respect to the Z-decoder. For the case A y< 2 = A^i, the the first stage 
of Theorem [6] matches cumulative rates of two stages of [6], there the first stage consists 
of transmission of the Y-decoder and the second stage performs SR with respect to the Z- 
decoder. The second stage of Theorem [6] consists of the third stage of [6] with SR performed 
(again) with respect to the Z-decoder. 

Lossless Reconstruction 

Consider the case of lossless reconstruction at either the Z-decoder at the first stage or the 
Y-decoder at the second stage. Similarly as in [Hj, it turns out that in these cases, the 
inner and outer bounds coincide. This observation is summarized in the following theorem: 

Theorem 7. 7/ A y , 2 = or A 2;i = 0, then TZ(D) nc = K**(D) nc = TZ*(D) nc . Specifically, 
when A 2j i = 0, lZ(D) nc is given as in the Subsection \4-2.1\ with the rate inequalities replaced 
by 

R x >I{X;Wx\Y) + H(X\W x ,Z) and R 2 > I(X; Wx, W 3 \Y) + H(X\W U W 3 , Z), (59) 
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for the auxiliary RVs satisfying (Wi, W3) 4 X 4- Z 4- Y . 

When A yj 2 = ; TZ(D) nc is given as in the Subsection \4- 2. l\ with the rate inequalities replaced 
by 

R 1 >I(X;W 1 \Y) + I(X;W 2 \W 1 ,Z) and R 2 >H(X\Y), (60) 
for the auxiliary RVs satisfying (Wi, W2) 4 X 4 Z 4 y. 

The proof of the achievability part of Theorem [7] can easily be done by setting W4 = 
const, and W2 = X and V = W3 for the requirement A z 1 = and setting V = W2 an d W3 = 
X for the requirement A y> 2 = in the inner bound lZ*(D) nc . The converse proof follows by 
applying the Heegard-Berger rate-bounds [9\ at both stages with the corresponding demand 
of lossless reconstruction at either the first or the second stage. When the outer bound is 
considered for each of the stages independently, it degenerates to the Heegard-Berger bound 
and thus an intersection of the Heegard-Berger bounds for the two stages provides a trivial 
outer bound to the outer bound obtained in this paper. Since the direct scheme achieves 
the communication rates suggested by the intersection, the bounds coincide. 

The key property of these special cases is the fact that not all auxiliary RVs that 
determine both inner and outer bounds are active simultaneously. Specifically, V, which 
stands for the information transmitted to the Z-decoder at the first stage and then repeated 
for the Y-decoder at the second stage, takes very specific values. The requirement A z 1 = 
means perfect reconstruction of the source performed by the Z-decoder at the first stage. 
For this case, the Z-decoder obviously needs the full information about the source, in the 
spirit of Slepian-Wolf [16] lossless coding. Therefore, the optimal scheme presents the 
information sent to Z-decoder at the first stage as if consisting of two (mutually dependent) 
parts - information V which is then revealed (refined) to the Y-decoder at the second 
stage (W3 = V) and the information needed by the Z-decoder, i.e., W2 = X. For the 
requirement A yi 2 = 0, it is expected that the Y-decoder will receive at the second stage 
all the information about the source, also in the spirit of [16J. As some of this information 
is already revealed to the Z-decoder at the first stage, all this information is refined to 
Y-decoder at the second stage (W2 = V) and then all remaining information is transmitted 
to Y-decoder directly (W3 = X). Interestingly, the cases considered in Theorem [7] are 
characterized by the same property: in both cases, the second stage transmission serves 
only the weaker Y-decoder. In the case A^2 = 0, it is obvious that A 2i 2 = can be 
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achieved as well. In the case A 2j i = 0, it is trivially obtained that A 2j 2 = as well and 
thus, only the Y-decoder benefits from the second stage transmission. 



5 Proofs for the Causal Case 

5.1 Proof of the Converse Part of Theorem [2] 

The pure source-coding problem is a special case of the joint source-channel problem. We 
provide a proof of the converse part of Theorem O which includes the converse of Theorem 
CD as a special case. 

Let (fi, g y ,i, g z ,i, f2i 9y,2i 9z,z) be given encoder and decoder functions for which the 
distortion constraints are satisfied at both stages. In the proof, for the first and the second 
steps of the communication protocol, we examine the mutual information I(X; B) and 
I(X;B), respectively. 

Firstly, for the case of causal state information at the encoder, we obtain 

n 

I{X-B) = ^/(X;^- 1 ) 

i=i 

n 

i=i 

n 

i=l 
n 



(a) 

i=l 

( = } nI{U 1;T ;B T \T) 

( J nI(U ltT ;B\T) 

< nI(U 1;T ,T;B) 
= nliU.-B) 

< nCi, (61) 

where (a) follows by denoting = (X, for i E {1, 2, n} (note that Uij and Si 

are independent); (b) - by defining a time-sharing auxiliary random variable T, distributed 
uniformly over {1,2, ...,n} independently of all other random variables in the system and 
noting that £7=1 H^i, A) = ™El=i s J (^.*» B = »I(^i,t, Br|T); (c) - by noting that 
B = Bt since the DMC is stationary; (d) - by denoting random variable U\ = (Uit,T); 
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and finally, (e) - by the standard channel coding theorem with causal state information at 
the encoder [17] since U\ -r- (A, S) B. 

For non-causal availability of state information at the encoder, note that the above 
defined RV's {Ui^} are, in fact, the same RV's as these used by Gelfand and Pinsker in [18] 
with X substituting the message V of [18] . In fact, with X substituting the message V, the 
converse proof of [18] is straightforwardly applicable to our case as all the conditions of the 
proof still hold. Therefore, we can as well upper-bound I(X; B) by 

I(X; B) ( = ] n{I(U x -B) - I(U X - S)} < nC x , (62) 

where (a) and (b) follow by [18J and C x stands for the Gel'fand-Pinsker channel capacity. 
On the other hand, 

I(X;B) = H{X)-H(X\B) (63) 

N 

= Y^WXilXt^-H&ilXt 1 **)] ( 64 ) 



(a) 



i=l 
N 



Y^[H(Xi) ~ (65) 

i=l 

H(x t \xi-\Y;-\zi'\B)} 

N 

Y,I(Xf,Xl-\Y*-\zr\B) (66) 



(J 

(J 
(J 

(/) 



i=l 
N 



Y J I (X t ;W l , t ) (67) 



1=1 



NI(Xf,W 1)f \f) (68) 

iV/(X;^ lif |f) (69) 

N[I(X;W hf ,f)-I(X;f)} (70) 

iV7(X;W lif ,f) (71) 

JVJ(X;Wi), (72) 



where (a) follows from the fact that the source is memoryless and from the Markov chain 
Xi - (X\-\B) - (Y/" 1 ,^- 1 ); (b) - by denoting W lti = (Xf 1 , Y*~\ Z{~\ B); (c) - by 
defining a time-sharing auxiliary random variable T, distributed uniformly over {1, 2, N} 
independently of all other random variables in the system; (d) - by noting that X = Xf 
since the DMS is stationary; (e) - is again due to the fact that the source is stationary and 
thus I(X;T) = 0; and finally, (f) - by denoting random variable W\ = (W 1 f,T). 
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Thus, for the first stage, we obtain NI(X; W\) < nC\ and by dividing both sides of the 
inequality by N we end up with I(X; Wi) < piC\, where C\ denotes the channel capacity 
for the case of causal or non-causal state availability at the encoder. I.e, condition ([30]) of 
Theorem [2] is satisfied. 

As for the second stage, by similar considerations as in (|6ip and (|62p . we obtain that 
I(X; B) < mC2, where C2 stands for the channel capacity of the second channel with state 
information available at the encoder (again, either causally or non-causally) . Also, 

I(X;B) = H(B)-H{B\X) (73) 

> H(B\B) - H(B\X) + I(B;B\X) (74) 

= H(B\B) - H(B\X, B) (75) 

= I{X-B\B) (76) 

N 

= ^/(A^IAT 1 ,^ (77) 



i=i 

N 



Y.mX^Xl 1 ^) (78) 



i=l 



H(Xi\X{- l ,B,B)] (79) 



N 



( == ^(A.lXr 1 ,^,^ 1 ,^ 1 ) (80) 
i=i 

- H(X i \Xl^ 1 , B, B, Y*~ l , Z^ 1 )] (81) 



N 



(J 

(J 
(e) 



B\X{-\ B, Yt\ Zt 1 ) (82) 

i=l 
N 

^J(X i; W2,i|W M ) (83) 



i=i 



NI{X;W 2 f\W h f,T) (84) 
NI(X;W 2 \W X ) (85) 



where (a) follows from the fact that conditioning reduces entropy and independence of the 
channels described by the following Markov chain (B, S) -f- X^r (B, S); (b) from the Markov 
chains X { -=- (-Xj -1 , B) -=- yj -1 ^} -1 and Xj -=- B) -=- Y^Zl' 1 ; (c) come from using 

the above-defined auxiliary random variables {Wi t i}fL 1 and denoting W2 t % = -B;(d) comes 



from using the above-defined random variables T as well as stationarity of the source, and 
finally, (e) comes from using the above defined random variable W\ and letting W2 = W 2 f- 
We obtain, hence, that NI(X; W2IW1) < mCi and division of both sides of the inequality 
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by N results in I(X; \V2\W1) < P2C2, which is exactly the condition (f3Tj) of Theorem [2 

Also, note that the Markov structure (Wi t i,W2i) -r- Xi -j- (Yi,Zi) holds for every i = 
1, ...,N. Due to this structure and the fact that the source Pxyz is stationary and memo- 
ryless, the Markov chain (Wi, W2) -i- X -r- (Y, Z) also holds, and thus, the condition given 
by ([22]) is satisfied. 

We next show that there exist functions G y i, G z i, G y>2 and G Z)2 that satisfy (|23[) 
- (f26|) . respectively. Denote by g Vj k,i and g z ,k,i the output of the decoders Y and Z, re- 
spectively, at stages k = 1,2 and times i = 1, ...,N. The random variable Wi contains 
(X} -1 ,^ 1- and contains 5. Choose the functions G Vt \, G V)2 , G z> i and 

as follows: 

G y , 1 ,f(Y,W 1 )=g y;ljf (Y?,B), (86) 
G ! M ,f(^^)= 9z,l,Hzf,B), (87) 
G yAf (y, Wi, W 2 ) = g yA fO^, B, B), (88) 



G z 2 f (Z, W U W 2 ) = g z 2 f {2%, B, B). 



We then have for the average distortion^ 

1 N 

Ed(X,G yA (Y,Wi)) = -Y,Ed(X,g yXi {Yl,B)) < A yA , (90) 



N 

8=1 



1 - 

Sd(A-,G,,i(Z,Wi)) = - < A Zjl , (91) 

i=l 



1 * 

Ed(X,Gy t2 (Y,Wi,W 2 )) = -Y,Ed(X,g yt 2,i(K),B,B)<Ay t 2 (92) 

»=i 



and 



1 N 

Ed(X,G z , 2 (Z,W 1: W 2 )) = -Y J E d(X,g z ,2,i(Z l i),B,B)<A z ^ (93) 

i=i 

5 The definitions in (186[1 - (|89[ 1 determine the outputs of the decoders functions at "stochastic" time T. For 
example, the output of the Y-decoder at the first stage at time T is governed by the first T symbols of the 
source SI, i.e., Y\ , and the channel output B. 
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i.e., the distortion constraints are satisfied. 

In order to complete the proof, it is left to show that the cardinality of the alphabets 
of auxiliary RVs W\ and W 2 is limited. We use the support lemma [19], which is based on 
Caratheodory's theorem, according to which, given J real valued continuous functionals qj, 
j = 1, J on the set V(X) of probability distributions over the alphabets X, and given any 
probability measure fi on the Borel <r-algebra of V(X), there exist J elements Qi, ...Qj of 
V(X) and J non-negative reals, ol\, aj, such that X^/=i a j = 1 an d f° r ever y J = 1) J 

J 

/ qj {Q)n{dQ) =Y^a i q j {Q i ). (94) 

Before we actually apply the support lemma, we first rewrite the relevant conditional mutual 
informations and the distortion functions in a more convenient form for the use of this 
lemma, by taking advantage of the Markov structures. We begin with I(X; W\): 

I(X;W 1 ) = H(X)-H(X\W 1 ), (95) 

and in the same manner, I(X; W2IW1) becomes 

I(X;W 2 \Wx) = B(X\W X ) - H(X\W X ,W 2 ). (96) 

For a given joint distribution of (X,Y,Z), H(X) is given and unaffected by W± and 
W 2 - Therefore, in order to preserve prescribed values of I(X; W\) and I(X; W 2 \Wi), it is 
sufficient to preserve the associated values of H(X\Wi) and H(X\Wi,W 2 ). 

We first invoke the support lemma in order to reduce the alphabet size of W\, while 
preserving the values of H(X\Wi) and H(X\Wi,W 2 ), as well as the distortions in both 
decoders at both stages of communication. The alphabet of W 2 is still kept intact at this 
step. Define the following functionals of a generic distribution Q over X x W 2 , where X is 
assumed, without loss of generality, to be {1,2, . ..,«}, a = \X\: 

Qi(Q) = ^2Q(x,w 2 ), i = x = l,2,...,a-l, (97) 
q a {Q) = -^2Q(x,w 2 )log^2Q(x,w 2 ), (98) 

X,W2 W2 

and 

q a +i{Q) = - ^2 Q(x,w 2 ) log Q(x\w 2 ). (99) 

X,W2 
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Also, we define 



la+2(Q) = y~^ j mm) j Q(x,w 2 )P(y\x)dy ] i(x,x), (100) 

y x,w2 



q a+ 3(Q) = y^mjny^ Q(x,w 2 )P(z\x)d z> i(x,x), (101) 



Z X,W2 



Qa+iiQ) = y^miny^ Q(x,w 2 )P(y\x)d y , 2 (x,x) (102) 
y x,w2 

and 

q a+5 {Q) = y~]mjny^ Q(x,w 2 )P(z\x)d Zt2 (x,x), (103) 

Z X,W2 

which along with (|98|) and (|99|) help us to preserve the rate and distortion constraints. 
Applying now the support lemma for the above defined functionals, we find that there 
exists a random variable W\ (jointly distributed with (X,Y, Z,W 2 ), whose alphabet size is 
\W\ \ = \X\ + 5 and it satisfies simultaneously: 

^Pr{W 1 = w 1 }q i (P(-\w 1 )) = P x (x), i = 1, 2, .., a - 1, (104) 

wi 

Y,MWi = u>ih tt (P(>i)) = H(X\Wi), (105) 

Wl 

MWi = Wl }q a+1 (P(-\ Wl )) = H(X\W U W 2 ), (106) 
VPrW = Wl }q a+2 (P(-\ Wl )) = minEd(X,G y , 1 (Y,W 1 )), (107) 

Wl 

J2PHWi = w 1 }q a+3 {P{-\w 1 ))=miiiEd(X,G z>1 {Z,W 1 )), (108) 

G z ,l 

Wl 

J^Pr{Wi = Wl }q a+4 (P{-\ W1 )) = mmEd(X,G y , 2 (Y,W u W 2 )) (109) 



Wl 



and 



VPr{Wi = Wl }q a+5 (P(-\ Wl )) =mmEd{X,G z , 2 (Z,Wx,W 2 )). (110) 



30 



Having found a random variable W\, we now proceed to reduce the alphabet of W 2 in a 
similar manner, where this time, we have (3 = \X\ ■ |Wi| — 1 constraints to preserve the 
joint distribution of (X,W\), just defined, and 3 more constraints to preserve the second- 
stage rate and distortions. Applying the support lemma, we obtain that W2 satisfies all the 
desired rate-distortion constraints and the necessary alphabet size of W2 is upper-bounded 
by 

|W 2 | < \X\ ■ |Wi| +2. (Ill) 
This completes the proof of the converse part of Theorem [2j 
5.2 Proof of the Direct Part of Theorem CD 

Let W\, W2, G Vt i, G y> 2, G Zj i and G Z) 2 be some elements in the definition of 1Z*(D) C that 
achieve a given point in that region. We next describe the mechanisms of random code 
selection and the encoding and decoding operations. 

Code Generation: 

Let e% > 0, €2 > and 5 > be arbitrary small and select R\ > I{X\W\) + e\ + 5 and 
A R = R 2 - Ri, A R > I(X; iy 2 |VFi) + e 2 + 5. For the first stage, 2^, sequences of length 
N, {Wi(k)}, k G [1, ...,2 NRl ], are drawn independently from Tp w . Let us denote the set 
of these sequences by C\. For each codeword Wi(k) = w\, a set of second-stage 
codewords {W2(k,j)}, j € [1, 2 NR2 ], are independently drawn from Tp w ^ (wi). We 
denote this set by C2(k) and its elements by {W2(k,j)}. Note that the 2 NRl sets {C2(-)} 
may not be all mutually exclusive. 

Encoding: 

Upon receiving a source sequence x, the encoder acts as follows: 

1. If x E Tp x and the codebook C\ contains a sequence Wi(k) = w\ s.t. the pair 
(x, w\) € Tp S xw , the fist such index k is chosen for transmission at the first stage. 
Next, if the codebook C2(k) contains a sequence W2(k,j) = W2 s.t. (x, W\,W2) € 
Tp S , the first such index 7 is chosen for transmission at the second stage. 

2. lfx<£T Px ,OT flWi(k) = Wi s.t. (x,wi) e^,or flW 2 (k,j) = w 2 s.t. (x, w h w 2 ) G 
Tp S , an arbitrary error message is transmitted at both stages. 
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Decoding: 

The decoders of the first stage retrieves the first-stage codeword according to its index 
and generates the reproduction by Xj = G Vt i (Yi, Wi t i(k)) and Xj = G Zj \ (Zi, Wi t i(k)), 
i G [1, 2, N]. Similarly, the decoders of the second stage retrieve both the first-stage 
and the second-stage codewords and creates the reconstruction of the source according to 
Xi = G y , 2 {Y h W hi (k) , W 2 ,i (k, j)) and X { = G z , 2 (Z u W lti (fe) , W 2jl (k,j)), i G [1, 2, N]. 

We now turn to the analysis of the error probability and the distortions. For each x and 
a particular choice of codes C\ and {C 2 (-)}, the possible causes for error message are: 

1 . x ^ Tp x . Let the probability of this event be defined as P ei . 

2. x G Tp x , but in the codebook C\ flw\ s.t. (x, v)\) € Tp xw . Let the probability of 
this event be defined as P e2 . 

3. x £ Tp x , and the codebook C\ contains w\ s.t. (x,wi) G Tp xw , but /Bw 2 G C 2 (wi) 
s.t. (x, wi, w 2 ) G Tp xw vw . Let the probability of this event be defined as P ez . 

Note that if none of those events occur, then, for the sufficiently large N, by the Markov 
Lemma [12|. pp. 436, Lemma 14.8.1] applied twice, the following is satisfied at both stages: 
with high probability (X, Z,±) G Tp {Wlxml and (X, F,X) G Tp lWl _ xm{ . In particular, 
the first application of the Markov Lemma occurs due to the Markov chain (Wi,W2) ^~ 
X -T- (Y, Z): Note that by the way of creation, X, Y and Z are jointly typical with high 
probability and also, with high probability, X, W\ and W 2 are jointly typical. Therefore, by 
the Markov Lemma, (X, Y, Z, W\, W 2 ) are also jointly typical with high probability. Also, 
note that due to the fact that the source is memoryless and by the way of creation of the 
reconstructions, the following Markov chains hold: X-=- ( Y, Wi) ^-X and X-j- (Z, Wi) -i-X, 
and also, at the second stage, X (Y, Wi, W 2 ) X and X + (Z, Wi, W 2 ) X. By the 
second application of the Markov Lemma, we obtain that with high probability X is jointly 
typical with X and X at the first stage and with X and X at the refinement stage. The 
probability that one or more of the above typicality relations do not hold vanishes as N 
becomes infinitely large. The joint typicality of (X, X), (X, X), (X, X) and (X, X) imposes 
that the distortion constraints (|23[) - (j26| ) are satisfied when iV is large enough. 

It remains to show that the probability of sending an error message, P e , vanishes when 
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N is large enough. P e is bounded by 

P e < P ei + Pe 2 + Pe 3 - (H2) 

The fact that P ei — > follows from the properties of typical sequences [12] . As for P e2 , we 
have: 

2 NR 1 

Pe 2 = II Vr{(x,Wx(k))tT 2 P 5 xWi }. (113) 



fc=l 

Now, for every k: 



Pt{(x, W^iTl 8 ^) = 1-Pr{(x, W 1 (k))ET* 5 XWi } (114) 



IT 25 



< 1-2 



-2V[Ipf;Wi)+ei] 



where the last equation follows from the size of typical sequences as are given in |12j . 
Substitution of ()159|) into (|158p and application of the well-known inequality (1 — v) N < 
exp(-vN), provides us with the following upper-bound for N — ► oo: 

2 niJ l 



^ 2 < 



1 - 2-^^ i^O+eill < exp | _ 2 NRi . 2 -N[I(X;W 1 )+e 1 ] 1 ^ (n5) 



double-exponentially rapidly since R± > I(X; W\) + e± + 5. 

To estimate P e3 , we repeat the technique of the previous step: 

2 NR 2 

Pe, = II Pr { (X ' WU t T Px Wl vw 2 }- ( 116 ) 

Again, by the property of the typical sequences, for every j: 

Pr {(*,«*, W 2 ( Wl ,j)) i Tf xWiVW2 ] < 1-2-^(^1^)+-], (117) 
and therefore, substitution of (|162|) into (|161|) gives 



Pes < 



1 - 2 



-JV[/(X;W2|Wi)+e 2 ] 



< exp {-2^ • _ Qj (n8) 



double-exponentially rapidly since R2 > I{X; W2\W\) + €2 + 5. 

Since P ei — > for i = 1,2,3, their sum tends to zero as well, implying that there 
exist at least one choice of a codebook C\ and related choices of sets {C2O}, that give 
rise to the reliable source reconstruction at both stages with communication rates R\ and 
A R = R 2 -Ri. 
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6 Proofs for the Non-Causal Case 
6.1 Outer Bound 

The proof of the outer bound follows the lines of the proof of Theorem 1 in [4J . Assume that 
we have an (n, M%, M2, {^- y ,k, (Az,fc}fc=i)) SR code for the source X with SI (Y,Z), as in 
Definition [TJ We will show the existence of a quintuplet (W\, W2, W3, W4, V) that satisfies 
the conditions 1-4 in the definition of lZ**{D) nc . First, note that 

NRi > H{h)>I{X-h\Y) = I{X;h,Z\Y)-I{X;Z\h,Y) 

n 

= £ i 1 ^ /1. z \ x ^^ *) - y > • ( ll9 ) 

i=i 

For notational convenience, we denote Z^ l Zf +1 = Z N \\ and use a similar notation for X 
and Y". Since (Xi,Yi) and (X^ 1 , y^V) are independent, we have, for the first term in the 
summand of (|119j) : 

IiXi-f^ZlX^ 1 , Y) = H{X i \Y i ,X i - 1 ,Y 1 ^ i )-H{X i \Y ii X i - 1 i Y N \ i ,h,Z) 
= HiX^-HiX^X^Y^J^Z) 
= IiXi-X^^Y^JuZ^). (120) 

Next, due to the Markov structure 

Zi - (X, Y) + (X N \\h, Z l -\Y N ^) (121) 

we have, for the second term in the summand of (jl 19j) : 

HX-Zilh, Y,Z 1 - 1 ) = H^Zilf^Y^-^-HiZilXJuY,^- 1 ) 
= H(Zi\h, Y.Z^-HiZ^h, Y,Z'- X ) 
= IiX^Zilf^Y,^- 1 ). (122) 

Substituting (fT20l) and (fi~22|) in (fTTOll . we obtain 

N 

NRx > ^2 X i_1 , y^*, /1, Z\Yi) — I(Xi; Zi\fi, Y, Z i_1 ) 

i=l 
N 

= Y, [l(X l ;Y N \\f 1 ,Z*- 1 \Y) + I(Xf,X l -\Z?\YJ 1 ,Y N \\Z*- 1 ) - I^Zilh, 7,2") 

-i=i 

n 

[l(X t ;f 1 ,Y N \\Z l - 1 \Y^+I(X t] X^\Z^ 1 \Y l ,Z l J 1 ,Y N \\Z 1 - 1 )] . (123) 



i=l 
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The Markovity of X -f- Z -f- Y implies 



(124) 



and we have for the second term in (|123p 



= H(Xi\f lt Y,Z l ) - H(Xi\f 1} Y, Z,X l - r ) 

= H(Xi,Yi\fi, Y N \' ,Z i )-H(Yi\f u Y N \' , Z*) - H (X { | /i , Y, Z, X*' 1 ) 

= H(Yi\Xi, /i.F^V*) + H{Xi\h,Y N \\Z l ) - H(Yi\fi,Y N ^ i , Z' 1 ) - Y, Z,^ 1 ) 

( = } HiXilfaY^^^-HiXilh, YZ,^- 1 ) 

= I{X l -Y l ,Zf +l ,X l - 1 \f 1 ,Y N \\Z i ) 

( = } IiX-Z^X^lh^,^) (125) 

where in (a) was used the Markov chain X % 4- (h,Y N \\ Z l ) 4- Y { . To justify (b), note that 
fx is a function of X and due to this feature, the fact that the source is a DMS and the 
Markov condition X 4 Z 4 Y, we obtain that X { 4 (/i, Y^, Z, X^ 1 ) 4 Y;. 
Substituting (fT2"5j) in (fl~23l) . we get 

AT 



JVi2i > ^ 

i=l 
AT 

E 



> 



(a) 



1=1 
N 



> 



(6) 



E 

i=l 
JV 

E 

N 

E 

JV 

E 

8=1 

E 

JV 

E 

i=i 



/i, Y N \\ Z'- 1 \Yi) + /(X; ZH,, X^lh, Y N \\ Z*)\ (126) 
I(X t ; h,Y N \\ Z^ 1 \Yi) + /pQ; Z^l/x, Y^, Z*)] (127) 
/(X,; A, Y^Y*) + /(X; Z^Vi, Y) + I{Z* Z'' 1 ^, Y, X t ) + I(X f , Z? +1 \h,Y N \\ Z*) 

i(x f , h,Y N \ l \Y t ) + i(x t , z f , z^l/i, 10 + 'PQ; z^l/^y^, z*) 

I{Xi- fuY^Yi) + J(Z i5 Z^lh, Y) + I(X t ; P~ x \h, Y, Z t ) + I(X f , Zf +1 \f u Y N \\ Z 
I{Xi- fuY^Yi) + I(X f , Z*~ l \h, Y, Zi) + I(X f , Z t N +1 \f u Y N \\ Z 

/(Xs/^E^iyo + Wj^-Vi^^^^ + i^;^!/!^^^ 

I{X i -J 1 ,Y^ i \Y i ) + I{X i] Z N \ i \f 1 ,Y N \\Z i )] , (128) 
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where (a) is due to the Markov relation Zi -r (fx, Y,Xi) Z % 1 and (b) is due to the 
Markov chain F; -=- Z; -=- (fx,Y N \\ Z^.Xi) that implies Y t -r- (fx,Y N \\Zi) 4- Z* -1 and 

Before defining the auxiliary random variables, we bound R% from below. We do that 
by repeating the steps (|119p - (|126p of lower-bounding Rx with a pair (fx, f-z) substituting fx 
in each step: 

NR 2 > H(fxj2)>H(X;fx,f 2 \Y)>I(X;fxj2,Z\Y)-I(X;Z\fx,f2, Y) 

N 

> ^[l(XfJ 1 J 2 ,Y N \\Z l - 1 \Y l ) + I(X l -Z l N +1 ,X l - 1 \f 1 J 2 ,Y N \\Z^ (129) 

i=l 

Define the random variables W x>i = (fx^Y^), V- = Z^ 1 , W 2>i = Z? +l , W 3 ,i = f 2 and 
W A i = X' 1 - 1 . With these definitions we have the Markov structure 

(Wx, h W 2)i , W 3ji , W iA , Vi) +X i + Z i + Y i (130) 

and the bounds (|128p and (|129p become 
1 - 

Ri > ^J3[/(X i ;Wi )i |y i )+J(X i ;W2,i,F i |W Mj Z i )] (131) 
i=i 

1 - 

i=l 

Let J be a random variable, independent of X, Y, and Z, and uniformly distributed over the 
set {1,2,..., N}. Define the random variables Wx = (J, W x ,j), V = (J, Vj), W 2 = (J, W 2y j), 
W 3 = (J, W 3>J ) and W 4 = (J, W 4 ,j). The Markov relations ([150]) still hold, that is 

(Wx,W 2 ,W 3 ,W±,V) + X + Z + Y, (133) 

and therefore the condition 1 in the definition of lZ**(D) nc is satisfied. 

We proceed to show the existence of functions G Vi x, G z> x, G y> 2 and G z ^, satisfying the 
second condition. Denote by g Vt k,i and g z ,k,i the output of the Y and Z decoders, respectively, 
at iteration k and time I, k = 1, 2, 1 < I < N. The random variable Wx contains fxY N \ J . 
At the same time, the triplet (Wi,V, W2) contains fxZ N \ J and so on. Therefore, let us 



6 Note that different choices of auxiliary RVs are possible. For example, one may choose: = fi, Y N ^ , 
V, = (Wi^Z*- 1 ), W24 = (Vi,Z? +1 ), W 3 ,i = (Vi,f 2 ), Wi, t = (W2,i,W 3 ,i,X i - 1 ). This choice would result in 
the following Markov chain: Wx, t Vi -7- (Wa,i, W 3 ,i) -r W 4 ,i + Xi + Z { -r Y l . 
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choose the functions G Vj x, G Z) i, G y ^ 2 and G Zj2 as follows 



G y! i,j(Y,Wi) = 9y ,iAY,fi) (134) 

G zXJ {Z,W x ,W 2 ,V) = g zX j{Z,f x ). (135) 

G w , 2) j(y,Wi,W3,V) = g y ,2,j( Y Jhh) (136) 

^,2,7(^^,^2,^3,^4,1/) = g z , 2 ,j(ZJi,f 2 ). (137) 



Then, for the distortions we have 

A? 



Ed Vl i(X,G yi i(y,Wi)) = ^^Ed,, 1 (X,^ )1J (y j /i))<A 2/)1 (138) 

i=i 

1 - 

m^&G^ZiWuW^V)) = j,J2 Ed ^( X ^z,i,j( z Ji))< A z,i (139) 

i=l 

1 N 

Edy,2(X,G v>2 (Y,Wi,W 3 ,V)) = -Ys Ed yAX,9y,2AY,f 1 ,f 2 ))<A y , 2 

3=1 

(140) 

1 N 

Ed z , 2 (X,G z>2 (Z,Wx,W 2 ,W 3 ,W 4 ,V)) = -Y J ^d z , 2 (X,g z , 2J (ZJ 1 J 2 ))<A z , 2 



3=1 

(141) 

Hence, condition 2 in the definition of lZ**{D) nc is satisfied. 

To prove that condition 4 of that definition holds, we have to show that the bounds (|42j) 
and (|43p can be written in a single letter form with W\, W 2 , W 3 and W4. The following 
chain of equalities holds 

J(X;Wi|y) = H{Wx\Y) -H(W X \X,Y) 

= H(Wi\Y) -H(Wi\X) 

= I(W 1 ;X)-I(W 1 ;Y) 

= H(X)-H(X\W 1 )-H(Y)+H(Y\W 1 ) 

= H(X) - H(X\J, Wx,j) - H(Y) + H(Y\J, W x ,j) 

N N N N 

= ^ E H w - ]y E - ^ E * ft) + ^ E ww) 

i=l i=l i=l i=l 

I N 

1=1 
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where the last equality is due to (|130j) . In a similar manner, we get 

I(X;W 2 ,V\W 1} Z) = I(X;J,W 2 j,J,Vj\J j W 1i j,Z)=I(X;W 2 ,j,Vj\J,W 1 ,j,Z) 
= H(X\J, Wi,j, Z) - H(X\J, W ltJ , W 2 ,j, Vj, Z) 

N N 

— Y, H ( x i\h W lti , Zi)--J2 # W> W l,i, W 2,i, ViZi) 



i=l i=l 
N 

X 



= -^^/(XijW^,^!^,^). (143) 

i=l 

In view of (|142p . (|143p . the bound (|13ip can be written as 

R x >I(X;Wi\Y) + I(X;W 2 ,V\Wx,Z). (144) 
In a similar manner, we shown that (|132|) can be written as 

R2>I(X;Wi,W 3 ,V\Y) + I(X;W 2 ,W4,\WuWa,V,Z). (145) 

Specifically, 

I(X;W U W 3 ,V\Y) = H{W X ,W 3 ,V\Y) - H{W U W 3 ,V\X,Y) 
= H{W 1 ,W 3 ,V\Y) - H{W 1 ,W 3 ,V\X) 
= H{W X ,W 3 ,V)-H{W U W 3 ,V\X) 
- (H{W U W 3 ,V) - H(Wx,W 3 ,V\Y)) 
= I{W U W 3 ,V;X)-I{W U W 3 ,V;Y) 
= H(X)-H(X\W 1 ,W 3 ,V)-H{Y) 
+ HiYlWuW^V) 

= H{X) - H(X\J, W hJ , J, W 3 ,j, J, Vj) - H(Y) 
+ H(Y\J,W 1: j,J,W 3 j,J,Vj) 

N N N 

i=l i=l i=l 

1 - 

+ ^J2 H (yi\wi,i,w 3ii ,Vi) 
i=i 

i - 

= -^/(X^x^W^V^) (146) 

4 = 1 
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where the last equality is due to (|130j) . In a similar manner, we get 

I(X;W 2 ,Wt\W u W 3 ,V,Z) = I(X;J,W 2 ,j,J,W4,j\J,W lt j,J,W 3 ,j,V,Z) 

= I{X-,W 2 ,j,W^j\J,W hJ ,W^j,V,Z) 
= H(X\J,W 1: j,W 3 ,j,Vj,Z) 

- H(X\J, W v , W 2jJ , W 3>J , W 4yJ , Vj, Z) 

1 N 

i=l 
1 - 

- nYI H ( x ^ w ^ w ^ w ^ w ^ ^ Zi ) 

i=l 
1 - 

i=l 

It is left to prove that the cardinality of the auxiliary RVs satisfies the third condition. 
This step of the proof extends the converse proof of [4] and conceptually is very similar 
to the above-detailed part of the converse proof of Theorem [2] which is related to reducing 
cardinality of the alphabets of auxiliary RVs. The detailed proof of this part is thus omitted 
and to complete the proof of the converse we merely outline it. Here also we use the support 
lemma [19] and rewrite the relevant conditional mutual informations and the distortion 
functions in a more convenient form for the use of this lemma. Similarly as in [5], we 
begin with the first term, I(X;Wi\Y), in the lower bound to R\, using the Markov chain 
Wi -T- X + Y: 

I(X;Wt\Y) = H{Wx\Y) - H{Wi\X,Y) 
= H{W!\Y) -H{W X \X) 
= H{W 1 )-I(Y-W X )-H{W 1 ) + I{X-W 1 ) 
= H(Y\W 1 )-H(Y)-H(X\W 1 )+H(X). (148) 

Next, we decompose the second term in the lower bound to R\, I{X; V, W 2 \Wi, Z), into 
I{X; V\Wi, Z) and I(X; W 2 \Wi, V, Z), and for I(X; V\W U Z) we have due to the Markov 
chain (W X ,V)+X + Z: 

I{X;V\Wx,Z) = H{X\Wx,Z)-H{X\W 1 ,V,Z) 

= H{X\Wi) - I(X; Z\Wi) + I(X; Z\W\,V) - H(X\W\,V) 
= H{X\W X ) - H(Z\Wx) + H(Z\Wx,X) - H(X\W\, V) 
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+ HiZ^V) - H(Z\W!,V,X) 

= H(X\W 1 )-H(Z\W 1 )+H(Z\W 1 ,V)-H(X\W 1 ,V). (149) 

Using the Markov chain (Wi, W 2 , V) + X -=- Z for I{X; W 2 \W U V, Z), we have: 

I{X;W 2 \W 1 ,V,Z) = H(X\Wi,V) - H(Z\WuV) 

+ H(Z\W U W 2 ,V) - H{X\W lt W 2 ,V). (150) 

Similarly, I(X;W 1 ,W 3 ,V\Y) can be decomposed into I(X;W 1 \Y), I(X;V\Wi,Y) and 
I(X; \V3\W1, V, Y), with two later terms, in turn, expressed as 

I(X; V\W U Y) = H{X\W{) - H(Y\Wi) + H(Y\W U V) - H{X\W\, V), (151) 

and 

I{X-W 3 \W U V,Y) = H{X\W U V) - H{Y\W±,V) 

+ H(Y\W 1 ,W 3 ,V)-H(X\W 1 ,W 3 ,V). (152) 

The second term in the lower bound to R 2 is I(X; W 2 ,W4\W\, V, W3, Z) and it can also be 
decomposed into 

1{X;W 2 \W X ,W 3 ,V,Z) = H{X\W U W 3 ,V) - H(Z\W U W 3 ,V) 

+ H(Z\W 1 ,W 2 ,W 3 ,V)-H(X\W 1 ,W 2 ,W 3 ,V). (153) 

and 

I(X;W i \W 1 ,W 2 ,W 3 ,V,Z) = H(X\W U W 2 ,W 3 ,V) - H{Z\W!,W 2 ,W 3 ,V) 

+ H(Z\WuW2,W 3 ,W4,V) - H{X\W!,W 2 ,W 3 ,W 4 ,X^) 
Thus, the lower bounds to Ri and R 2 can be expressed as following: 

IiX-W^ + IiX-V^W^Z) = [H(X)-H(Y)] + [H(Y\W 1 )-H(Z\W 1 )} 

+ [H{Z\W 1 ,W 2 ,V)-H(X\W 1 ,W 2 ,V)] (155) 

and 

IiX-W^W^V^+IiX-W^W^W^W^Z) = [H(X)-H(Y)] 

+ [H{Y\W U W 3 ,V) - H{Z\W U W 3 ,V)] 

+ [H(Z\W 1 ,W 2 ,W 3 ,W 4 ,V) 

- H(X\W 1 ,W 2 ,W 3 ,W 4 ,V)]. (156) 
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Since [H(X) — H(Y)] is a constant that depends only on the given statistics of the source 
and SI Y, in order to preserve prescribed values of the above lower bounds, it is suf- 
ficient to preserve the associated values of [H(Y\Wi) - H{Z\Wx)\ + [H(Z\W 1 ,W 2 ,V) - 
H{X\W U W2,V)] and [H{Y\W U W 3 , V) - H(Z\W U W 3 , V)] + [H(Z\Wi, W 2 , W 3) W4, V) - 
H{X\W u W2 t Wz,Wi,V)]. 

From here on the proof is essentially similar to the one provided for Theorem [2j The 
support lemma is first used to reduce the alphabet size of W±, while preserving the values 
of (I155P and (1156j) and the distortions at both stages. The alphabets of the remaining 
auxiliary RVs are kept intact at this stage of the proof. There are \X\ — 1 functionals to be 
defined that help to preserve the source distribution, 2 more to preserve (|155[) and (|156p and 
4 more functionals to preserve all the distortions at both stages. Thus, it is easy to show 
that it is possible to find auxiliary RV W\ which necessary alphabet size is upper-bounded 
by \X\ + 5. Next, we reduce the alphabet size of V, where now in addition to the values 
of the lower bounds and distortions A z 1, A^ and A 2) 2, it is desired to preserve the joint 
distribution (X, W±). There are |#||VVi| — 1 + 2 + 3 constraints imposed on V and thus its 
alphabet size is upper-bounded by | | ( | | + 5) + 4. In a similar manner, the reduction of 
the alphabet cardinality is further performed for W 2 , W3 and W4 where at each stage, the 
support lemma is applied in so that the statistics of the source and all already "reduced" 
RVs are maintained as well as lower bounds to the relevant rates and distortions. 

6.2 Inner Bound 

6.2.1 Code-book generation 

First, randomly generate, according to Pwi(-), a codebook C Wl of 2^- N ^ I ^ x ' ,Wl ^ +ei+ ^ inde- 
pendent codewords {ii>i,i} of length N, where the coordinates are also generated i.i.d. Then, 
partition the codewords into 2^ N ^ I ^ X ' Wl ^ +e2+s ^ bins (e 2 > ei). 

Next, for each {wi^}, randomly generate a codebook C v {w\ y i) consisting of 
2[N{i(X;V\w 1 )+e v +5)} coc i eW ords {vij}, where the generation of each coordinate is according 
to P v \Wi(-) and partition this codebook into 2^ N(I ( X ' V \ Wl ^ +e ^' +5 ^ bins, C v (w 1A ), (e v > > e v ). 
Each bin in the codebook of {«j ,-} contains a little less than 2^ N ( I ( Z '^ Wl '" codewords. Par- 
tition each such bin into sub-bins, C*(toi j), each of a size of a little less than 2^ N ^^ Y ' V ' Wl '". 
There are about 2^ N( -^ Z ' V \ W ^- I( - Y ' V \ W ^ such sub-bins. 

For each pair {101,1, v i,j} randomly generate a codebook C TO2 (*tfi,i, Vij) consisting of 
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2[N(i(X;W2\Wi,V)+e 3 +S)] coc j eW ords {w2 t i t j fc}, where the generation of each coordinate is ac- 
cording to Pw 2 \W u v(-) and partition C W9 (voi,i, Vj) into 2^ N( - I( - X ' W2 \ Wl ^ +£i+5 ^ bins (e 4 > 

Now, randomly generate for each pair (u>i t i, Vij) a codebook C W3 (wi,i, Vij) of 
2 [N(i(X;W 3 \w 1 ,V)+es+5)] co dewords {w 3:iJ j} according to Pw 3 \w u v(') and partition C Wa (wi,i, Vj) 
into 2^(n^;^3l^i.V,r) +e6+ 5)] bing (£6 > eg) 

Finally, for each quadruplet W2,ij,fe, W3,i,j,i, v i,j}, randomly generate a codebook 

C m (w hi , v id , w 2Aj , k , w 3 , i:jtl ) of 2W(™|W li w/ 2 ,w/ 3 y)+e 7+ 5)] codewords {u^y,™} accord- 
ing to JV 4 |W 1 ,W 2 ,Ws,v(-) and partition it into 2^ N ^ I{ - X ^ W ^ W ^ W ^ V ' Z ^ + ^ bins (e 8 > e 7 ). 

For clarity of exposition, the generation of codebooks is demonstrated in Fig. [3l 




Figure 3: Achievability Scheme - Code Generation. 

6.2.2 Encoding 

Given a source sequence x, the encoder seeks a vector in C W1 such that x and are jointly 
typical. If such w%i is found, in C v (^w\^ , the encoder seeks a vector vi j such that the source 
sequences x and will be jointly typical with it. The encoder proceeds this way, seeking 
w 2,i,j,k i n C-w 2 ( w i,ij v i,j) so that (x, Wi t i, Vij, W2,ij,fe) are jointly typical. The encoder then 
seeks in C W3 (wi t i, v^j) a codeword W3,ij,; so that (a?, tei^, Vij, w^^jj) are jointly typical. 
Due to the Markov chain Wi -f- (X, Wi, F) -r W3, had the encoder managed to find such 
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sequences, (x, w lti , v i:j , w 2 ,ij,k, w 3,i,j,i) will be jointly typical with high probability. 

If the encoder found jointly typical sequences (x, wi^, Vij, W2,i,j,fej W3,i,j,j), it seeks in 
C W4 (wij, Vij, t«2,ij,fc) w 3,i,j,i) a sequence WA,i,j,k,i,m which will be jointly typical with all the 
above-mentioned sequences. If at any stage of its search the encoder fails to find a "good 
sequence", it declares an error. As is shown in the sequel, the probability of such an 
event is very low, due to the typicality properties of the scheme. Otherwise, i.e., if all 
the jointly typical sequences are found, the encoder acts as follows: At the first stage, it 
conveys to the decoders a single transmission consisting of the following concatenated in- 
dexes: the index B\ of the bin to which belongs, of length of about NI(X;W\\Y) 
bits; the index B 2 of C v (w\ : i), s.t., Vij G C v (w\j), which can be described by about 
NI(X; V\Wi, Z) bits and the index B 3 of the bin to which W2,ij,fc belongs, which requires 
about NI(X; W 2 \W±, V, Z) bits. At the refinement stage, it transmits the index B\ of 
Cy(wi i) to which Vij belongs within C v {w\^) (previously described by B2), which requires 
about N[I(Z; V\W±) — I(Y; V\Wi)] bits, concatenated with the indexes B§ and B§ of the 
bins containing w 3tijji i and w 4 ,i,j,k,i,m, in C w3 (w lti , v i:j ) and C w4 {wi yi , v i:j , w 2 ,i,j,k, W3,i,j,i), of 
about NI(X; W 3 \W U V, Y) and NI(X; W 4 \W 1 ,W 2 ,W 3 , V, Z) bits, respectively. The trans- 
mission rates at both stages are as defined by 1Z*(D) nc up to {e;}. 

6.2.3 Decoding 

First stage: The first decoder accesses (Bi,B 2 ,B 3 ), but performs W-Z decoding procedure 
with respect to B\ only. Specifically, in C W1 , in the bin indexed by Bi, the decoder seeks a 
unique sequence W\ : i that was chosen by the encoder. Due to the Markov chain Wi+X-i-Y, 
as the block-length becomes infinitely large, the decoder will find with probability tending 
to 1 the correct sequence wij. Since in each bin in C Wl there are less than 2 ' ' 1 ' 
codewords, and these codewords were generated i.i.d, the probability of existing at the bin 
indexed by B\ of another codeword jointly typical with Y vanishes as — > 00. 

The second decoder uses three indexes {B\,B 2 , B 3 ) to retrieve all three codewords chosen 
by the encoder. Specifically, it retrieves wij similarly as Y-decoder does, since, as it has 
access to a more informative SI, it can do whatever the Y-decoder can do. Afterwards, it 
retrieves correctly Vij £ C v {w\^) in the bin indexed by B 2 , which is possible due to the 
Markov chain (V, W±) tXt2. The Z-decoder does not find in bin indexed by B 2 other 
codewords which are jointly typical with z since there are less than 2 NI ^ Z ' V \ W ^) codewords in 
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that bin. Finally, following similar considerations, after retrieving (itr^, Vij), the Z-decoder 
retrieves correctly w 2 ,ij,k G C W2 (wi t i, Vij) in the bin indexed by B3. 

Second Stage: Note that after the first transmission Y-decoder is able to find all 
codewords v which are jointly typical with y in the bin indexed by B 2 in the codebook 
This is due to the Markov chain (Wi,V) tXtV. But, it cannot reveal which 
of these codewords was chosen by the encoder, as there are more than 2 NI <yW w J such 
codewords (there are a bit less then 2 NI ( Z ' V \ w t) such codewords, as is required by the W-Z 
coding designed for Z-decoder). When the Y-decoder receives the index B\ of C%(wi t i), since 
Vij <G Cy(tOi t i) C C v (wi t i), it searches Vij among a group of codewords of a size less than 
2 Ni(Y-v\w 1 ) co dewords, and thus, it is is able to retrieve Vij correctly by the W-Z decoding 
argument. After Y-decoder has found Vij, it performs W-Z decoding of the codeword 
w 3,i,j,l € C W3 (wi t i,Vij) according to the bin-index B$ and (wij,Vij). It now improves the 
reconstruction of the source sequence with an aid of the triplet Vij, tvsjjj), which is 

possible within the defined distortion due to the typicality properties of the scheme. 

The Z-decoder, which after the first step has retrieved correctly (with probability tending 
to 1, as N — > 00) the sequences (wij, Vij, itf2,i,j,fc)j makes no use of index B|, as it serves 
Y-decoder only. The Z-decoder uses its knowledge of (wi,j, Vij) as well as the fact that its 
SI is more informative to decode correctly tityi^y in the bin of C W3 (wij, Vij) indexed by B$. 
Finally, it uses all the codewords it managed to find thus far to perform conditional W-Z 
decoding and to find the correct codeword W4,ij,fc,;,m according to the index Bq of a bin in 

C W4 (wi :i , Vij, W 2 ,i,j,k, w 3,i,j,l)- 

At each stage, after each of the decoders has found correct codewords, it performs recon- 
struction of the source sequence x. Due to the typicality properties of the scheme, i.e., X 
(W!,Y)-i-X, X+(Wi, W 2 , V, Z)+X, X-r(W!, W 3 , V, Y)^X and X+(W U W 2 , W 3 , W 4 , V, Z)+ 
X, the distortion constraints are satisfied at both decoders. 

6.2.4 Analysis of Probability of Error 

We now turn to the analysis of the error probability. For each x and a particular choice of 
the code C Wl and related choices of ({C v (-),C W2 (-),C m (-),C W4 (-)}), the possible causes for 
error message are: 

1. x^Tp . Let the probability of this event be defined as P ei . 
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2. x G Tp x , but in the codebook C Wl fiw\ } % s.t. (x, € Tp xw . Let the probability 
of this event be defined as P e , 



cw 1 

e 2 - 



3. x £ Tp x , and the codebook C m contains s.t. (x,Wi t i) G ^p xw > but /3uy € 
C«(toi,i) s.t. (sb, 101^, «y) € v - Let the probability of this event be defined as 
p 

4. x G 7p x , the codebook C tol contains 10^ s.t. (x, w\ t i) G Tp xw , and also the codebook 
C v (w\ }i ) contains «y s.t. (s, wij, «y) G Ip* , but J5w2,i,j,k G C«; 2 (ti>i,i, «y) s.t. 
(a:, uy, W2,ij,fe) G -^Pxw ™ ' ^ e probability of this event be defined as P e4 . 

5. x £ Tp x i the codebook C Wl contains vo\^ s.t. (x,w\ t i) G Tp xw , the codebook 
C,,(t«l,i) contains My s.t. (x, w^i, v it j) G T|,* ivv , but /9u> 3) yy G Ctu 3 (u>i,i, My) s.t. 
(a:, u?!^, My, ufyyy) G Ip^ w vm/ . Let the probability of this event be defined as P e5 . 

6. x G Tp , the codebook C Wl contains wji^ s.t. (x, W\ ,i) G Tf , the codebook 



Ct)(*"l,i) contains My s.t. (sc, Wi^, My ) G Tp^^, and the codebooks C W2 (w\ 



My 



and C W3 (wx ti ,Vij) contain wJ 2 ,y,fe s.t. (x,w lji ,Vi ij ,w 2 ,i,j t k) G T$* xw vw and ufe,y, m 
s.t. (s, 101,1, «y, «&,y,m) G , respectively, but (», My, w 2 ,y,fc, Ufyyy) £ 



Tp* 5 . Let the probability of this event be defined as P eR . 



7. x G Tp X 5 the codebook C mi contains TMy s.t. (a:, M7i,i) G Tp^- W > the codebook 
C v (wi,i) contains My s.t. (x, mji^, My) G Tp xwv , and the codebooks C U)2 («;i ) i, My) 
and Cu, 3 (i0i,i, My) contain w 2 ,y,fc s.t. (sc, u^j, My, u^y.fc) G Tp^. and ufyy.m s.t. 
(», «y, w 3,i,i,m) G r P XWlVW3 > respectively, and also (a;, tu M , My, u^y,*, w 3,i,j,i) G 
^PxWiV^Wg' but ^ w 4,yy,fc,m G C TO4 (tOi,i, My , W 2i y >& , W 3j yy) S.t. 

(sc, My, w 2 ,y,fc, ufyyy, Wi,i,j,k,i,m) G . Let the probability of this event 

be defined as P P , 



e 7 - 



Note that if none of those events occur, then, for the sufficiently large N, by the Markov 
Lemma [121 pp. 436, Lemma 14.8.1] applied twice, the following is satisfied: with high 
probability (X, Y, X) are jointly typical and (X, Z, X) are jointly typical at both stages. 

1. The first application of the Markov Lemma occurs due to the Markov chain (Y, Z) -r 
X^riW\ , V, W2 , W3 , W4) : Note that by the way of creation, X, Y and Z are jointly typ- 
ical with high probability and also, with high probability, RV's ( W\. W2, W3, W4, V) 
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and X are jointly typical. Therefore, by the Markov Lemma, all the sequences X, Y, 
Z, W\, W<2; W3, W& and Fare also jointly typical with high probability. And so, 
Sis are jointly typical with the auxiliary RV's at both stages of communication. 

2. Also, note that due to the fact that the source is memoryless and by the way of 
creation of the reconstructions, the following Markov chains hold at the first stage: 
X (y, W\) ~r~ X and X (Z, W\, V, W2) -r- X. Similarly, at the second stage, 
X + ( Y, Wi, V, W 3 ) -T- X and X -f- (Z, Wi, V, W 2 , W 3 , W 4 ) -t- X. By the second 
application of the Markov Lemma, we obtain that with high probability X is jointly 
typical with X and X at both stages. The probability that one or more of the above 
typicality relations do not hold vanishes as N becomes infinitely large. The joint 
typicality of (X, X) and (X, X) imposes that the distortion constraints ([330 - (|36H are 
satisfied when N is large enough (see [4j Section 6] for explicit derivations). 

It remains to show that the probability of sending an error message vanishes when N is 
large enough. The average probability of error P e is bounded by 

Pe < P ei + Pe 2 + Pe 3 + P ei + Pe 5 + Pe, + Pe 7 • (157) 

The fact that P ei — > follows from the properties of typical sequences [12] . As for P e2 , we 
have: 

\c wi | 

Pe, = II W ^) i Tf xwi } . (158) 

fc=l 

Now, for every k: 

Pt{(x, W lik )tTl 5 XWi ) = 1-Prj^, W ltk )GT 2 P 5 XWi } (159) 



\T 2S 
- I Pxw 1 



< 1 _ 2 - N i I (X;W 1 )+e 1 ]^ 

where the last equation follows from the size of typical sequences as are given in |12j . 
Substitution of f)159|) into (|158p and application of the well-known inequality (1 — v) < 
exp(-vN), provides us with the following upper-bound for N — * oo: 



Pe 2 < 



1 _ 2 -iV[/(X;Wi)+ei] 

double-exponentially rapidly since \C Wl \ = I(X; W\) + e\ + 5. 
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< exp {-\C W1 \ ■ 2 - N ^ x ' w ^] -+ 0, (160) 



To estimate P ez , we repeat the technique of the previous step: 

\Cu | 

Pes = II Pr {(*> W ^> t T Pxw lV } ■ ( 161 ) 

3=1 

Again, by the property of the typical sequences, for every j: 

Pr {(*, Wl , Vij) i T 3 P s xWiV ] < 1 - (162) 
and therefore, substitution of Q162|) into (|16ip gives 



Pes < 



1 



-N[I(X;V\Wi)+e 2 ] 



I I 



< exp {-|C| • 2 ~ N ^ X ^ W ^} - 0, (163) 



double-exponentially rapidly since |C V | = /(X; V|Wi) + e 2 + 5. 

To estimate P e4 , the technique of the previous step is again repeated: 

\Cu>2 I 

P e4 ± J] Pr {(x, w u , Vi>j , W 2>i , j>k ) i Tf xWiVW2 } . (164) 
k=l 

Still, by the property of the typical sequences, for every k: 

Pr {(*, Wl>i , Vi>j , W 2>i>j>k ) i Tf xwivw2 } < 1 - {m) 
and therefore, substitution of (|165p into (|164p gives 



Pe 4 < 



1 _ 2- N { I ( x ->W2\w 1 ,v)+e 3 } 1 "21 ^ exp |_| Cu)2 | . 2 -A r ^(^;M/2|H/iy)+e 3 ] j Q) ( 166 ) 



double-exponentially rapidly since \C W2 \ = I(X; W 2 \Wi, V) + e 3 + 5. 

Similarly as in the previous step we show that P es and P e? vanishes as well when 
N is large enough, using the fact that \C m \ = I(X;W 3 \Wi,V) + €q + 5 and \C W4 \ = 
I(X; Wi\Wx, W 2 , W 3 , V) + e 7 + 8, respectively. 

The proof for P e6 is different and it uses the Markov lemma [121 pp. 436, Lemma 14.8.1]. 
In the previous steps we show that the probability that the quadruples (X, Wi, V, W 2 ) and 
(X, Wi, V, W3) are jointly typical with high probability. Now, due to the Marlov lemma 
applied to the Markov chain W 2 +(X, W x , V)-=-W 3) the probability that (X, Wi, V, W 2 , W 3 ) 
are not typical tends to zero with N approaching infinity. Therefore, P ee — > when N — > 00. 

Since P es — > for s € [1,7], their sum tends to zero as well, implying that there exist 
at least one choice of a codebook C m and related choices of sets {C v }, {C W2 }, {C W3 }, {C Wi } 
that give rise to the reliable source reconstruction at both stages with communication rates 
Ri and R 2 . 
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